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Abstract — We present a combinatorial characterization of the 
Bethe entropy function of a factor graph, such a characterization 
being in contrast to the original, analytical, definition of this 
function. We achieve this combinatorial characterization by 
] counting valid configurations in finite graph covers of the factor 
graph. 

Analogously, we give a combinatorial characterization of the 
Bethe partition function, whose original definition was also of an 

] analytical nature. As we point out, our approach has similarities 
to the replica method, but also stark differences. 

The above findings are a natural backdrop for introducing 
a decoder for graph-based codes that we will call symbolwise 

] graph-cover decoding, a decoder that extends our earlier work 
on blockwise graph-cover decoding. Both graph-cover decoders 
are theoretical tools that help towards a better understanding 
of message-passing iterative decoding, namely blockwise graph- 

■ cover decoding links max-product (min-sum) algorithm decoding 
with linear programming decoding, and symbolwise graph-cover 
decoding links sum-product algorithm decoding with Bethe free 
energy function minimization at temperature one. 

In contrast to the Gibbs entropy function, which is a concave 
function, the Bethe entropy function is in general not concave 
everywhere. In particular, we show that every code picked from 
an ensemble of regular low-density parity-check codes with 
minimum Hamming distance growing (with high probability) 

\ linearly with the block length has a Bethe entropy function that 
is convex in certain regions of its domain. 

Index Terms — Bethe approximation, Bethe entropy, Bethe 
partition function, graph cover, graph-cover decoding, message- 
passing algorithm, method of types, linear programming decod- 
\ ing, pseudo-marginal vector, sum-product algorithm. 



I. Introduction 

WHAT IS THE meaning of the pseudo-marginal func- 
tions that are computed by the sum-product algorithm, 
especially at a fixed point of the sum-product algorithm? 
This question stood at the beginning of our investigations. 
For factor graphs without cycles the answer is clear, and 
was stated succinctly already by Wiberg et al. HI, ||2l: the 
pseudo-marginal functions at a fixed point of the sum-product 
algorithm (SPA) are the correct marginal functions of the 
global function that is represented by the factor graph. Note 
that here and hereafter we assume that SPA messages are 
updated according to the so-called "flooding message update 
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schedule." For factor graphs of finite size and without cycles 
this impUes that the SPA reaches a fixed point after a finite 
number of iterations HI. El- 

However, in the case of factor graphs with cycles, the answer 
is a priori not so clear, even for fixed points of the SPaQ 
Of course, one can express the SPA-based pseudo-marginal 
functions as marginal functions of the global functions of 
computation-tree factor graphs, the latter being unwrapped 
versions of the factor graph under consideration H], 
However, the analysis of these objects has so far proven 
to be rather difficult, a main reason for this being that the 
computation trees, along with the global functions represented 
by them, change with the iteration number. 

A. A Combinatorial Characterization of the Bethe Entropy 
Function in Terms of Finite Graph Covers 

Towards making progress on the above-mentioned question, 
this paper studies the Bethe free energy function of a factor 
graph, a function that was introduced by Yedidia, Freeman, 
and Weiss fS) and whose importance stems from a very well- 
known theorem in |3] which states that fixed points of the 
SPA correspond to stationary points of the Bethe free energy 
function. Consequently, it is clearly desirable to obtain a 
better understanding of this function by characterizing it from 
different perspectives. 

Recall that the Bethe free energy function F-q is defined to 

be 

i^B(/3) = [/B(/3)-'r-i^B(/3), 

where /3 is a (locally consistent) pseudo-marginal vector, U-b 
is the Bethe average energy function, H-q is the Bethe entropy 
function, and T ^ is the temperature. (All the mathematical 
terms appearing in this introduction will be suitably defined 
in later sections.) Both Ub and H^. contribute significantly 
towards the shape of F-q- However, the curvature of F-q is 
exclusively determined by the curvature of Hb\ this is a 
consequence of the fact that Ub. is a linear function of its 
argument. Therefore, characterizing the function F-g, is nearly 
tantamount to characterizing the function H-q- 

In this paper we offer a combinatorial characterization of 
the Bethe entropy function Hg. in terms of finite graph covers 
of the factor graph under consideration. Recall that in earlier 

' Here and in the following we assume that the local functions of the factor 
graph are non-negative. Moreover, we assume that we are only interested 
in SPA-based pseudo-marginal functions that are normalized, i.e., pseudo- 
marginal functions that sum to one. Therefore, without loss of generality, we 
may assume that at every iteration the SPA messages are normalized, i.e., that 
they sum to one. With this, SPA fixed points are well defined, also for factor 
graphs with cycles. 
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work im, Q we showed that every valid configuration in 
some finite graph cover maps down to some pseudo-marginal 
vector with rational coordinates, and vice-versa, every pseudo- 
marginal vector with rational coordinates has at least one pre- 
image in some finite graph cover. (Actually, the papers IH, 
ifSl focused on a special case of graphical models, namely 
graphical models that represent binary low-density parity- 
check codes, and consequently dealt with pseudo-codewords. 
However, the results therein are easily generalized to the more 
general setup considered here.) The present paper discusses 
the following extension of this result. Namely, letting f3 be 
a pseudo-marginal vector with only rational coordinates and 
letting Cm {(3) be the average number of pre-images of (3 
among all the M-covers, we show that Cm{(3) grows, when 
M goes to infinity, like 

Cm{I3) = exp (A/ • i/B(/3) + o{M)). 

This characterization of the Bethe entropy function has clearly 
a "combinatorial flavor," which is in contrast to the "analytical 
flavor" of the original definition of the Bethe entropy function 
in lO (see Definition [T4l in the present paper). 

B. A Combinatorial Characterization of the Bethe Partition 
Function in Terms of Finite Graph Covers 

This paper offers also a combinatorial characterization of the 
Bethe partition function Z-q in terms of finite graph covers of 
the factor graph under consideration. This is again in contrast 
to the original, analytical, definition of Z-q that defines Z-q via 
the minimum of F-q. Compare this with the Gibbs partition 
function Zq: its definition is combinatorial in the sense that 
Zq is defined as a sum of certain terms. (Of course, the Gibbs 
partition function can also be characterized analytically via the 
minimum of the Gibbs free energy function Fq.) 

More precisely, recall that the Gibbs partition function (or 
total sum) of a factor graph N is 

where the sum is over all configurations of N and where g 
is the global function of N. We show that the Bethe partition 
function can be written as follows 

Zb(N) = limsup Zb,a/(N), (1) 

A/— !-00 

where 

^b.m(N) ^ y/(Zg(N))^ . . (2) 

Here the expression under the root sign represents the average 
of Zq{H) over all M-covers N of N. Clearly, the expression 
for Z'q{H) given by ([TJ-llJli has a "combinatorial flavor." 

Interestingly, the expression in (|2]i is based on only two 
rather simple concepts (besides the standard mathematical con- 
cepts of taking limits, taking roots, and computing averages); 
we only need to define the concept of an i\/-cover of a factor 
graph and the concept of the Gibbs partition function of a 
factor graph. In our opinion, these concepts are quite a bit 
simpler than the ones needed for defining F-q and then ^b(N) 
in terms of the minimum of F-q. (A technical note on the side: 
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Fig. 1. Blockwise grapli-cover decoding forms a bridge between max-product 
(min-sum) algoritlim decoding and linear programming decoding, the latter 
being equivalent to Bethe free energy minimization at temperature T = 0. 
(See Section [T^ for more details.) 
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sum-product algorithm Bethe free energy function [T = 1) 

decoding minimization 

Fig. 2. Symbolwise graph-cover decoding forms a bridge between sum- 
product algorithm decodi ng an d Bethe free energy minimization at tempera- 
ture T = 1. (See Section IT^ for more details.) 

in order for the minimum of F-q to make sense, we need the 
assumption that was stated in Footnote [T] namely that all local 
functions of N are assumed to be non-negative. As we will see, 
this assumption is also crucial for showing the equivalence of 
the left- and right-hand sides of ([T]).) 

Note that ^ contains a finite sum. For small factor graphs N 
and small M this fact can be exploited, for example, to 
perform some brute-force computations and come up with 
conjectures of the relationship of Zb,a/(N) with respect to 
(w.rt.) Zg(N). Afterwards, one can try to analytically prove 
these conjectures about Zb,j\/(N) for any finite M, and thereby 
prove a similar conjecture for Zq{H). This line of reasoning 
is especially interesting if the proofs can be extended to hold 
for any factor graph within some class of factor graphs. 

C. Graph-Cover Decoding 

One of the main motivations of the papers pll, [|5l to study 
finite graph covers of a factor graph N was the fact that finite 
graph covers of N look locaUy the same as N. Consequently, 
any locally operating algorithm, like the max-product algo- 
rithm or the sum-product algorithm, "cannot distinguish" if 
they are operating on N or, implicitly, on any of its finite 
graph covers. Clearly, for factor graphs with cycles, this "non- 
distinguishability" observation implies fundamental limitations 
on the conclusions that can be reached by locally operating 
algorithms because finite graph covers of such factor graphs 
are "non-trivial" in the sense that they contain valid configu- 
rations that "cannot be explained" by valid configurations in 
the base factor graph. This is in sharp contrast to factor graphs 
without cycles: all i\/-covers of such factor graphs are "trivial" 
in the sense that they consist of M independent copies of the 
base factor graph, and so the set of valid configuration of any 
A/-cover equals the A/-fold Cartesian product of the set of 
valid configurations of the base factor graph with itself. 

In fact, in the context of message-passing iterative decoding 
of graph-based codes with cycles, we argued in Q, ||6l that 
these fundamental limitations of message-passing iterative 
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decoding, in particular of max-product (min-sum) algorithm 
decoding, imply that these decoders behave less like block- 
wise maximum a-posteriori decoding (which is equivalent to 
minimizing the Gibbs free energy function at temperature 
T = 0), but much more like linear programming decoding fTl, 
[8] (which is equivalent to minimizing the Bethe free energy 
function at temperature T = 0). This was done with the help 
of a theoretical tool called blockwise graph-cover decoding, 
a tool that in ||5l was simply called graph-cover decoding. 
Namely (see Fig. [l), 

• on the one hand we showed the equivalence of blockwise 
graph-cover decoding and linear programming decoding, 

• on the other hand we argued that blockwise graph-cover 
decoding is a good "model" for the behavior of the max- 
product (min-sum) algorithm decoding. 

This latter connection, namely between blockwise graph-cover 
decoding and max-product (min-sum) algorithm decoding, is 
in general only an approximate one. However, in all cases 
where analytical tools are known that exactly characterize the 
behavior of max-product algorithm decoding, the connection 
between blockwise graph-cover decoding and the max-product 
(min-sum) algorithm decoding is exact. 

In this paper, we define symbolwise graph-cover decoding, 
which tries to capture the essential limitations of sum-product 
algorithm decoding. It will allow us to argue that for graph- 
based codes with cycles, sum-product algorithm decoding 
behaves less like symbolwise maximum a-posteriori decoding 
(which is equivalent to minimizing the Gibbs free energy 
function at temperature T — 1), but much more like an 
algorithm that minimizes (at least locally) the Bethe free 
energy function at temperature T = 1. This will be done (see 
Fig.©, 

« on the one hand, by showing that symbolwise graph-cover 
decoding is equivalent to (globally) minimizing the Bethe 
free energy function at temperature T = 1, 

• and by arguing that symbolwise graph-cover decoding is a 
good "model" for the behavior of sum-product algorithm 
decoding. 

Similar to the above discussion about blockwise graph-cover 
decoding, this latter connection, namely between symbolwise 
graph-cover decoding and sum-product algorithm decoding, 
is in general an approximate one. However, in many cases 
where analytical tools are known that exactly characterize the 
behavior of sum-product algorithm decoding, the connection 
between symbolwise graph-cover decoding and sum-product 
algorithm decoding is exact. 

In any case, using the combinatorial characterization of 
the Bethe entropy mentioned earlier in this introduction, one 
can state that a fixed point of the sum-product algorithm 
corresponds to a certain pseudo-marginal vector of the factor 
graph under consideration: it is, after taking a biasing channel- 
output-dependent term properly into account, the pseudo- 
marginal vector that has (locally) an extremal number of pre- 
images in all 7\/-covers, when M goes to infinity. 

D. The Shape of the Bethe Entropy Function 

The paper concludes with a section on the concavity, or the 
lack thereof, of the Bethe entropy function. Recall that the 



Gibbs entropy function is a concave function, and therefore 
the Gibbs free energy function is a convex function. However, 
the Bethe entropy function of factor graphs with cycles does in 
general not exhibit this property. In fact, in this paper we show 
that the factor graph associated with any code picked from 
Gallager's ensemble of (c^l, ^r) -regular low-density parity- 
check codes has a Bethe entropy function that is convex in 
certain regions of its domain if the ensemble is such that the 
minimum Hamming distance of its codes grows (with high 
probability) linearly with the block length. This means that 
there is a trade-off between two desirable objectives: on the 
one hand to pick a code from a code ensemble with linearly 
growing minimum Hamming distance, on the other hand to 
pick a code whose factor graph has a concave Bethe entropy 
function, i.e., a convex Bethe free energy function. 

E. Related Work 

Let us briefly discuss some work that is related to the 
content of this paper. 

> Of course, what is called the Bethe approximation in the 
context of factor graphs has a long history in physics and 
goes back to ideas that were presented in a 1935 paper 
by Bethe ||9] (see also the 1936 paper by Peierls fTOI ). 
Bethe's approximation therein was mostly an assump- 
tion about the conditional independence between dif- 
ferent sites in a crystalline alloy. Kurata, Kikuchi, and 
Watari ifTTl later on pointed out that this approximation 
is exact on what they called a "Bethe lattice." (For further 
information on these and related topics in physics, we 
recommend, e.g., |[T2l - |[T4l .) 

Let us comment on the Bethe lattice of a lattice. Consider 
a lattice L and a factor graph N. In factor-graph language, 
if L corresponds to N, then the Bethe lattice L of L 
corresponds to the universal cover N of N, i.e., the limit of 
a computation tree of N with arbitrary root in the limit 
of infinitely many iterations. With the help of N it is 
possible to give a combinatorial characterization of the 
Bethe partition function of N as some suitably normalized 
sum of the global function of N over all its configurations. 
However, to make this rigorous, one has to formulate N 
as a suitable limit of computation trees. This is not too 
difficult for very regular factor graphs or for factor graphs 
with suitable correlation decay properties. However, for 
general factor graphs the limit of the above-mentioned 
normalized sum is rather non-trivial. This difficulty is not 
quite surprising given, among other reasons, the fact that 
the SPA may asymptotically exhibit many different types 
of behaviors (fixed point, periodic, or even "chaotic"), 
the fact that copies of factor-graph nodes have different 
multiplicities in finite-size computation trees (see, e.g., 
ifTSl ). or the fact that the fraction of leaf nodes among 
aU nodes in a computation is non-vanishing in the limit of 
infinitely many iterations. Clearly, the expression in ([T]) 
also contains a limit, however, in our opinion that limit 
is significantly simpler Moreover, many effects that are 
responsible for the similarities and differences between 
the Bethe and the Gibbs partition function are already 
visible in finite graph covers with small cover degree. 
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• Some computations that we will perform in the present 
paper are very similar to the computations that are 
necessary to derive the asymptotic growth rate of the 
average Hamming weight enumerator of proto-graph- 
based ensembles of (generalized) low-density parity- 
check (LDPC) codes |fT6l - ll22l (see also the earlier work 
on uniform interleavers ||23ll . Il24l ). However, besides 
some brief mention of the fundamental poly tope in lfT6l . 
these papers do not seem to elaborate on the connection of 
their results to the Bethe entropy function. (An exception 
is the very recent paper 1221 .) 

• As already stated in the above introduction, the pa- 
pers m, ||5| investigated some fundamental limitations 
that locally operating algorithms have compared to glob- 
ally operating algorithms. It is worthwhile to point out 
that Angluin ||25]| . in a paper that was published in 1980, 
used a very similar global- vi. -local argument to character- 
ize networks of processors. Although on a philosophical 
level the starting point of her argument is very akin to 
the one in ID, ||5l, her conclusions are quite different in 
nature (which is not so surprising given the differences 
between her setup and our setup). 

Much closer to the approach in ID, ||5l is the relatively 
recent paper by Ruozzi et al. 1261 which showed that 
results on the limitations of locally operating algorithms 
on Gaussian graphical models (in particular results based 
on the concept of waUc-summability l27l ) can be re- 
derived by studying graph covers. 

• There are other papers where the Bethe approximation 
plays a central role in characterizing the suboptimal 
behavior of locally operating algorithms, in particular let 
us mention l28l-l33l. 

• Although some concepts in the present paper are evoca- 
tive of concepts of the replica method (see, e.g., 1341 
and l35l Chapter 8], l36l Appendix I]), there are also 
stark differences, as will be discussed in Section IVII-EI 
However, inspired by an earlier version of the present 
paper, Mori l37l recently showed an alternative (and 
simpler) approach to some computations that are done in 
the context of the replica method. See also the follow-up 
papers 1381 l39l. 

• Let 6 he a non-negative square matrix and let perm(0) 
be the permanent of this matrix l40l . In the paper 141], 
many concepts of the present paper are specialized to 
a certain graphical model N(0) for which Zq{N{6)) = 
pcrm(0) holds. The reformulation of the Bethe partition 
function for such graphical models was subsequently used 
by Smarandache 1421 to give a proof for a conjecture 
about pseudo-codewords of LDPC codes. 

After the initial submission of the present paper and 
of BTI . we became aware of a paper by Greenhill, 
Janson, and Rucihski l43]| that, in the language of the 
present paper, introduces a graphical model N'(0) for 
which Zg(N'(0)) = perm(0) holds and for which they 
compute high-order approximations of Zb,a/(N'(0)). 
However, note that N'{9) is in general different from 
N(0). A detailed discussion of connections between N(0) 
and N'(0) is given in El Section VILE]. 



• Based on the reformulation of the Bethe partition function 
in an earlier version of the present paper, Watanabe l44l 
stated a conjecture about the relationship of the number of 
independent sets of a graph and its Bethe approximation 
(along with other similar conjectures), and Ruozzi 1 451 
proved that the Gibbs partition function of a graphical 
model with log-supermodular function nodes is always 
lower bounded by its Bethe partition function, thereby 
proving a conjecture by Sudderth, Wainwright, and Will- 
sky l46l . 

• Graph covers were used in the recent paper l47l to 
explain why the Bethe partition function is very close 
to the Gibbs partition function for certain graphical 
models that appear in the context of constrained coding. 
They were also used in l48l to prove properties of the 
Bethe approximation of the so-called pattern maximum 
likelihood distribution. 

F. Overview of the Paper 

This paper is structured as follows. We conclude this first 
section with a subsection on notations and definitions. Then, 
in Section |ll] we review the basics of normal factor graphs, 
i.e., the type of factor graphs that we will use in this paper, 
and in Section |lll] we discuss the Gibbs free energy function 
and related functions. The Gibbs free energy function is again 
the topic of Section HV] where we present a simple setup 
where this function arises naturally. Afterwards, in Section [Vl 
we move on to introduce the Bethe approximation and the 
functions that come with it. 

After reviewing the main facts about graph covers in Sec- 
tion IVII we come to the main part of this paper, namely 
Section IVIII where we present the promised combinatorial 
characterization of the Bethe entropy function and the Bethe 
partition function. 

In contrast to the previous sections that considered a general 
factor graph setup, the next three sections focus on factor 
graphs that appear in coding theory. Namely, Section IVIIII 
reviews some relevant concepts. Section |IX] discusses block- 
wise and symbolwise graph-cover decoding, and Section |X] 
investigates the influence of the minimum Hamming distance 
of a code upon the Bethe entropy function of its factor graph. 

Finally, the paper is concluded in Section IXj The longer 
proofs of the lemmas and theorems in the main text are 
collected in the appendices. 

G. Basic Notations and Definitions 

This subsection discusses the most important notations that 
will be used in this paper. More notational definitions will be 
given in later sections. 

We let Z, Z^o, Z>o, M, R^q, M>o, and Fa be, respectively, 
the ring of integers, the set of non-negative integers, the set 
of positive integers, the field of real numbers, the set of non- 
negative real numbers, the set of positive real numbers, and 
the Galois field of size two. Scalars are denoted by non- 
boldface characters, whereas vectors and matrices by boldface 
characters. All logarithms in this paper will be natural loga- 
rithms, and so entropies will be measured in nats. (The only 
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exception are figures where entropies are shown in bits.) As 
usually done in information theory, we define log(O) = —oo 
and • log(O) = 0. 

Sets are denoted by calligraphic letters, and the size of a 
finite set S is written like \S\. The convex hull and the conic 
hull ||49ll of some set TZ G are, respectively, denoted by 
conv(7?.) and conic(7?.). 

We use square brackets in two different ways. Namely, 
for any L e Z>o we define [L] = {!,..., L}, and for 
any statement S we follow Iverson's convention by defining 
[S] = 1 if 5 is true and [S] = otherwise. 

Finally, for a finite set S, we define II^ to be the set of 
vectors representing probability mass functions over S, i.e.. 



T^s = <P= (Ps 



ses 



Ps ^ for all s e 5, = 1 



II. Normal Factor Graphs 

Factor graphs are a convenient way to represent multivariate 
functions ||50ll . In this paper we use a variant called normal 
factor graphs (NFGs) ifSTl (also called Forney-style factor 
graphs (521), where variables are associated with edges. 

The key aspects of an NFG are best explained with the help 
of an example. 

Example 1 Consider the multivariate function 



gi^ei , ■ • ■ , flea ) — 9fi (Oei , Aea j ^es ) ' 5/2 (^ea 7 ^63 , Oee ) 

■ 3/3 ('^£3 1 '^£4 ' ) ■ 5/4 {^B5 ! flee ! fles ) 

■ 3/5 (fler, flea), 

where the so-called global function g is the product of the 
so-called local functions gf^, gf^, gf^, gf^, and gf^. The 
decomposition of this global function as a product of local 
functions can be depicted with the help of an NFG N as shown 
in Fig. |5] In particular, the NFG N consists of 

• the function nodes fi, fi, f^, f^, and f^; 

• the half-edges ei and 64 (sometimes also called "external 
edges "); 

• the full-edges ei, 63, 65, eg, 67, and eg ( sometimes also 
called "internal edges"). 

In general, 

• a function node f represents the local function gf; 

• with an edge e we associate the variable ( note that a 
realization of the variable A,, is denoted by ag); 

• an edge e is incident on a function node f if and only if 
tte appears as an argument of the local function gf. 

Note that the NFG N contains three cycles, one involving 
the edges 62, 65, eg, one involving the edges 63, eg, 67, eg, 
and one involving the edges 62, 63, 65, 67, eg. As is well 
known from the literature on graphical models, and as we can 
also see from other parts of this paper, the existence/absence 
of cycles in an NFG has significant implications for its 
properties, in particular with respect to the behavior of locally 
operating algorithms like the max-product algorithm and the 
sum-product algorithm. □ 




Fig. 3. NFG N used in Example [T] 



We now present the general definition of an NFG that we 
will use in this paper 

Definition 2 An NFG N{J^,£ ,A,G) consists of the following 
objects. 

• A graph {J-',£) with vertex set T (also known as the 
function node set) and with edge set £ = £haif U ffuii, 
where £haif and £fuii represent, respectively, the set of 
half-edges and the set of full-edges. 

• A collection of alphabets A = {Ae\e^ey where the alpha- 
bet Ae is associated with the edge e € £. In the following, 
with a slight abuse of notation, A will also stand for 
the set A = YieeS Cartesian product of the 
alphabets {Ae}ee£- Moreover, we also define the sets 
^hait = rieefhaif ""^ -^'"11 - nee£,„„ -^e- Clearfy, 

A = -4half X Afulh 

• A collection of functions Q = {.9/}/eJ^ (called local 
functions), where the function gj is associated with the 
function node f € J- and further specified below. □ 



Definitions Given an NFG N{T,£,A,G), we make the 
following definitions. 

• For every f £ J-, we define £f to be the set of edges 
incident on f. 

• A vector a in the set A, i.e., 

a = {ae)ee£ e A, 

will be called a configuration of the NFG. For a given 
vector a, we also define for every f (z T the sub-vector 

— {o,f,e)ee£f — {cie)ee£f 

Note that we will also use the notation df ~ {af_e)eG£f 
when there is not necessarily an underlying configuration 
a of the whole NFG. 

• For every f G J^, the local function gj is an arbitrary 
mapping 



9f- Y[^e 

ee£f 



aj ^ gf{af). 



For every f G T, we define the function node alphabet 
Af to be the set 



Af^{afel[Ae 

ee£f 



9f{af) 
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This set is also known as the local constraint code of the 
function node f. 

The global function g is defined to be the mapping 



g: A 



/ 



Equivalently, in the case where we distinguish between 
half- and full-edges, g represents the mapping 



g ■ Ahalf X ^tull 



I, (flhaif, Qfuii) ]^g/(a/)- 

/ 



A configuration a with g{a) ^ is called a valid 
configuration. The set of all valid configurations, i.e., 



C={aeA 



fle G Ae, e <E E 

af e Af, f e T 

is called the global behavior of N, the full behavior of 
N, or the edge-based code realized by N. 
The projection of C onto Shall, i-e., 

Chalf = {(Ce)eG£h,if | ^ £ C} 

. there exists an a^uW G -Afuu 

flhalf e .Ahalf ^^^1^ ji^^j ^^^^^^^ ^^^^^^ ^ ^ 

is called the half-edge-based code realized by N. 

A comment concerning the above definition: in the follow- 
ing, when confusion can arise what NFG an object is referring 
to, we will use more precise notations like ^n, C(N), etc., 
instead of g, C, etc.. 

Example 4 Consider again the NFG N that is discussed in 
Example\l\and depicted in Fig. \3\ It is shown again in Fig. 
Assume that its details are as follows: 

• The variable alphabets are Ae = {0, 1}, e G f . 

• The local functions are 



1 '/ Ee6£, a/,, 

otherwise 



Therefore the local constraint codes are 



(mod 2) 



(mod 2)) , f eT, 



i.e., single parity-check codes of length \£f\. 

The configuration shown in Fig. |4] corresponds to the 
variable assignment 

Qj (^^ei ; ^62 1 ^63 ; ^64 5 '^65 ; ^eg 7 ^67 1 ^eg ) 

= (1,0,0,1,1,0,1,1). 
The configuration a has the following sub-vectors 

ai = (aei , flea , Oej, ) = (1, 0, 1), a2 = (Cg^ , a^^ , flee ) = (0, 0, 0), 



03 = (Oe 



, ae7) = (0, 1, 1), a4 = (aes, flee, fle J = (1,0,1), 



a5 = (ae7,aej =(1,1)- 

Because af Cz Af for all f G the configuration a is a valid 
configuration. One can easily check that the global function 




Fig. 4. Configuration a on the NFG N from Fig. [3] for every e £ £, if 
fle = tlien tlie edge e is tliin and in black, wliereas if ae = 1 tlien tlie edge 
e is tliick and in red. (See Example |4] for more details.) 



value of a is g{a) = 1. (In fact, for this NFG the global 
function value of all valid configurations is 1.) 

The set of all valid configurations of N turns out to be 

(0,0,0,0,0,0,0,0), (0,1,0,0,1,1,0,0), 

. (0,0,1,0,0,1,1,1), (0,1,1,0,1,0,1,1), 

^ (1,0,0,1,1,0,1,1), (1,1,0,1,0,1,1,1), ^' 

(1,0,1,1,1,1,0,0), (1,1,1,1,0,0,0,0) 

and its projection unto fhaif — {ei, 64} is 

Chalf = {(0,0), (1,1)}. 

□ 

Although the definition of NFGs requires the global function 
to be such that all variables are arguments of at most two 
local functions, this does not really impose a major restriction 
on the expressive power of NFGs. Namely, this requirement 
can easily be circumvented by replacing a global function by 
a suitably modified global function that contains additional 
variables and additional local functions. (We refer to ifSTl . Il52l 
for further details.) 

In the following, when there is no ambiguity, we will use 
the short-hands ' X^a ' ' X^a respectively, 

Assumption 5 For the rest of the paper, we assume that for 
all / € J^, the co-domain of the local function gf is R^o. i-e., 
the set of non-negative real numbers. Consequently, for every 
a € A it holds that g{a) G M^o- ^ 

The above assumption is not a significant restriction since 
many interesting problems can be cast in terms of an NFG that 
satisfies this assumption. As will be evident from the upcoming 
sections, the main reason for imposing the above assumption 
is the fact that the definitions of the Gibbs average energy 
function (and therefore the Gibbs free energy function) and the 
Bethe average energy function (and therefore the Bethe free 
energy function) contain expressions that involve the logarithm 
of the global and the local functions. 

However, we stress that the above assumption is not neces- 
sary for defining the Gibbs entropy function and the Bethe 
entropy function. Therefore the upcoming results on these 
functions hold for the more general setup of Definition |3] 
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III. The Gibbs Free Energy Function and 
THE Gibbs Partition Function 

This section reviews the concept of the Gibbs free energy 
function of an NFG, along with some related functions. The 
temperature T appears in them as a parameter 

Assumption 6 Throughout this paper, the temperature T will 
be some fixed non-negative real number, i.e., T € M^o- (If 
definition requires T G M>o, then this will be pointed out.) □ 

The Gibbs free energy function is defined such that its 
minimal value, along with the location of its minimal value, 
encode important information about the global function that 
is represented by the NFG. In particular, for T G M>o the 
minimal value equals the temperature T times the negative 
logarithm of the partition function, where the partition function 
is the sum of the (l/T)-th power of the global function over 
all configurations. 

Definition 7 Consider an NFG N{J^,£,A,G)- For any tem- 
perature T € M^o. th^ Gibbs free energy function associated 
with N is defined to be (see, e.g., ^) 



I, p^Ug{p)-T-Hg{p), 



where 



Ug-.IIc^R, -X]pc- log (5(c)), 

C 

c 

Here, Ug is called the Gibbs average energy function and Hg 
is called the Gibbs entropy function. 

Moreover, for T G M>o, the (Gibbs) partition function 
associated with N is defined to be (see, e.g., ^) 



aeA cec 



/T 



(4) 

□ 



Note that "function" in "partition function" refers to the 
fact that the expression in (|4|i typically is a function of 
some parameters like the temperature T. (A better word for 
"partition function" would possibly be "partition sum" or 
"state sum," which would more closely follow the German 
"Zustandssumme" whose first letter is used to denote the 
partition function.) 

Lemma 8 Consider an NFG H{F,£,A,g). For T e K^o, 
the function Fg{p) is convex in p. Moreover, for T G K>o, 
the function Fg [p) is minimized by p = p*, where 



Pc 



Zg 



ceC. 



(5) 



At its minimum, Fg takes on the value 

Fg(p*) = -T.log(ZG), (6) 
which is also known as the Helmholtz free energy. 

Proof: For T = we have Fg{p) = Ug{p), P G He. 
The convexity of Fg then follows from the convexity of Ug, 



which follows from the fact that C/q is a linear function of its 
argument. 

For T £ ]R>o, the Gibbs free energy function Fg can be 
expressed in terms of a relative entropy functional, namely 



Fg{p) ^T-D[ (pe)ce 



Zr 



cec 



T ■ log(ZG). 



The statements in the lemma then follow easily from standard 
properties of the relative entropy functional (see, e.g., Il53l ). 

■ 

Note that, with appropriate care, results involving the Gibbs 
free energy function and related functions at temperature T = 
can be recovered from studying the case T € R>o and taking 
the limit T I 0. However, as mentioned in Assumption |6] in 
this paper the temperature T is a fixed parameter and we will 
not consider such limits. 

Let us briefly discuss a variant of the above Gibbs free 
energy function. Namely, for some Chaif G Chaif, we will say 
that p e He is compatible with Chaif if 



Pc 



^ (for all c' e C with cj^^y. = Chaif), 
= (for all c' e C with cj^^y. ^ Chaif) 



With this definition, as an alternative to the minimization 
problem in Lemma|8] we can consider a minimization problem 
where we minimize over all p that are compatible with some 
given Chaif - Technically, we can accomplish this by defining a 
modified Gibbs free energy function Fq that equals the Gibbs 
free energy function Fg for p's which are compatible with 
this Chaif, and that is infinite otherwise. 

Definition 9 Consider an NFG N(J^, ^, CJ). For any tem- 
perature T G R^o. th^ modified Gibbs free energy function 
associated with N is defined to be 

F^ : Chaif X He ^ RU{+oo} 

Pg{p) (P is compatible with Chaifj 



(Chalf,P) 



+0O 



(otherwise) 



Moreover, for T G K>o. the modified (Gibbs) partition 
function associated with N is defined to be 

: Chaif ^M, Chaif ^ 9{C'Y'^. 

Lemma 10 Consider an NFG H{J-,£,A,Q) and fix some 
Chaif e Chaif- For T e M^o, the function FQ(chaif,p) is 
convex in p. Moreover, for T G K>o, the function FQ{chiii[,p) 
is minimized by p = p*, where 

p*, A | 4tci!a») ' ''half = Chalfj, 

[0 (otherwise) 
At its minimum, FQ(chaif , • ) takes on the value 

F'^{C^,,UP*) = -T ■ log (Z^(Chalf))- 

Proof: Similar to the proof of Lemma [8] ■ 

Clearly, if the NFG N does not contain any half-edges, 
then the modified Gibbs partition function is essentially a 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 



8 



scalar. Note that this modified Gibbs partition function (at 
temperature T = 1) equals what Forney fSij calls the partition 
function of an NFG and what Al-Bashabsheh and Mao ||55l 
call the exterior function of an NFG. 

IV. Why the Gibbs Free Energy Function Arises 
Rather Naturally 

Of course, besides the function Gibbs free energy function 
Fq, there are many ways to formulate a function F : He — > M 
such that 

• the minimum of F{p) is achieved at p = p*, where p* 
equals the expression in (|5]), 

• and such that the minimum value F{p*) equals the 
expression in (|6]l. 

The goal of this section is to discuss a setup where Fq arises 
rather naturally as a function that has these properties. This 
section also reviews some important concepts from the method 
of types (see, e.g., 1531 . 1561 ), many of which inspired the 
concepts that were briefly mentioned in Section H] and that will 
be introduced more thoroughly in Sections |VT] and IVIII Note 
that we do not make any novelty claims for the observations 
discussed in the present section. 

Throughout this section we consider an NFG N{J',£,A, G)- 
For simplicity we consider only the case where Ae — {0, 1} C 
R for all e € £. Moreover, throughout this discussion, the 
temperature will be fixed to T = 1. (Both assumptions are not 
critical, and with a suitable formalism, the results can easily 
be generalized.) 

We start by defining a probability mass function Pc{c) on 
C that is induced by the global function on N, namely 

.9(c) 



Pcic) 



Zr 



ceC, 



where Zq is defined in (|4]i. Now, assume that for every e E £ 
and fle e Ae we want to compute the marginal 



PaM = 



E 



Pcic) 



(This computational problem comes up, for example, as part of 
symbolwise maximum a-posterior decoding, cf. Section lTX-CI ) 
To that end, define the vector r] = {'rje.i)ee£ with components 
rje^i = Pa^(1). Clearly, once we have computed the vector 
T], we have all the desired marginals because Pa^{0) = 1 — 
Pa^(1) ~ 1 — rje.i, e G £. It can easily be verified that rj 
satisfies 



V 



cec 



Pcic) 



E[C]. 



(7) 



In practice, the sum in ^ is very often intractable because 
the set C is very large. However, (|7]i shows that rj is some 
expectation value and so we can try to approximate it by 
stochastic averaging. Therefore, let 



^sample(l); Csample(2)i 



■sample{7\/) 



be M i.i.d. sample vectors distributed according to Pc- Then 

(8) 



''^ M E/ '^sample(m)- 

melM] 



(This expression is akin to the expression in l35l Section 13.2] 
on "decoding by sampling.") This approach can work well for 
certain NFGs. However, in general it is, unfortunately, difficult 
to efficiently obtain enough i.i.d. samples so that rj can be 
estimated with sufficient accuracy. The difficulty of generating 
i.i.d. samples happens for example in the case of NFGs that 
represent good codes, see the discussion in l35l Section 13.2]. 
Therefore, the expression in ^ does not offer a shortcut 
for the main computational step in symbolwise maximum a- 
posteriori decoding. (Of course, this observation is not really 
surprising given the well-known computational complexity of 
that decoder) 

Nevertheless, conceptually the expression in dSJ is very 
useful as it suggests the following considerations that will lead 
to a function that fulfills the promises that were stated at the 
beginning of this section. Namely, let 



■'M 



be M i.i.d. random vectors with distribution Pc- Then 



nG[M] 



M 



E-E n poic„ 

ciGC ca/GC \me[A/] 



^7 E 6- 

e[M] 

- E - 



M 



(9) 



where step (a) follows trivially from the definition of Cm, 
m e [il/], and was inspired by As simple as it is, this step 
is actually the only "non-trivial" step in the whole discussion 
here. The rest will simply be a "mechanical" application of 
the method of types (see, e.g., l53l . l56l ) towards simplifying 
the expression in (|9]l. 

Therefore, let us recall the relevant definitions from the 
method of types. 

Definition 11 Consider an NFG HiJ- ,£ ,A,Q) and fix some 
integer M G Z>o. 

• (Mapping) Define the mapping 

ipM-C^^^^C, C= iCm)me[M]^ q^^\ 

where 



inumber of appearances of c in c) 

n E [cm = c], ceC. 



M 

1 

M 



[Cm = CJ 

m6[M] 

(In the above expression we have used Iverson's conven- 
tion that was defined in Section 17-GI ) 
(Type) Let c be a sequence over C of length M, i.e., 
c = iCm.)melM] G C^^. Then the vector 

g'"' = VAiic) 

is called the type of c, or the empirical probability 
distribution of c. 

(Set of all possible types) The set Qm He is defined 
to be the set of all possible types that are based on 
sequences over C of length M, i.e.. 
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(Type class) For any q £ Qm, the type class of q is 
defined to be the set of all vectors in C^^ with type q. 
Equivalently the type class of q is defined to the pre- 
image of q under the mapping ifM, i.e., 



TMiq)^V],j\q) = {ceC'' 



(Mean vector) Assume Ae — {0, 1} C M/or all e E £. 
For any type q £ Qm, the mean vector associated with 
q is defined to be 



niean(q') = qc ■ c. 



□ 



The following lemma contains some well-known properties 
of the objects that were introduced in the above definition. 



Lemma 12 Consider an NFG H{J-,£,A,G) and fix some 
integer M e Z>o. 

• The size of the set Qm is upper bounded as follows 



\Qm\ < (M + 1 



||C| 



Because \C\ is a fixed number for a given NFG N, this 
upper bound is a polynomial in M. 
Let c = {c,n)me[M] G C*^, i.e., c is a sequence over C 
of length M. Then 

n Poic„-,) = ^ exp (-M . Ug (q^'))) , 

melM] G 

where Uq and Zq are, respectively, the Gibbs average 
energy function and the Gibbs partition function associ- 
ated with N, see Definition 

For any type q G Qm, l^t C]\j{q) be the size of the type 
class Tm (q) of q. Then 

CM{q) = exp {M ■ Hciq) + o{M)) , 

where Hq is the Gibbs entropy function associated with 
N, see Definition (Because CM{q) is counting cer- 
tain objects, this characterization of the Gibbs entropy 
function has clearly a " combinatorial fiavor," which is in 
contrast to the " analytical fiavor" of the Gibbs entropy 
function in Definition^) 

Let c = {cm)me[M] G C*^, i.e., c is a sequence over C 
of length M. Then 



Proof See, e.g., 153 

With this, the A/-fold summation V - • • • V - in (|9ll can 
be replaced by the double summation X^qeQAf ^c^Tm{<iY ^"'^ 



we obtain 



n pcic 

ne[M] 



- E 



me[M] 



(b) 



= ^xr ■cxp(-A/-;7G(q)) = mcan(q) 

= E E ^■exp(-M-?7G(9)) •mcaii(g) 
E ^■exp(-A/-C/G(9)) •mcan(q')- ^ 1 

SeTAiiq) 



= X! '^Mi.q) ■ mean(q), 
qeQM 



CM(q) 

(10) 



where at steps (a) and (b) we have used Lemma [12] where at 
step (c) we have used the fact that the terms appearing in the 
summation are independent of c given their type q, where at 
step (d) we have used Lemma [T2l and where at step (e) we 
have used the abbreviation 



SA/(g) = • exp ( - A/ • Uciq)) ■ CM{q)- 
Similarly, we obtain 

1-E---E n poic.-) 

ci Cm melM] 

= E E n poic„.) 

Q^Qm ceTAiiq) me[M] 

= -V-oxp(-M-;7G(q)) 

^G 

= E ^•exp(-M.C/G(q))- J2 1 

qeQM 



E ^M(q), 
qeQM 



(11) 



(12) 



i.e., sm is a probability mass function on Qm- Moreover, 
using Lemma [121 it follows from (ITTl i that 

^M{q) = ^ • exp ( - A/ • {Uciq) - Hoiq)) + o(A/)) 

G 

= -i^j • exp ( - A/ • Fciq) + o(A/)) , qeQM- 

(13) 

Because dTOb holds for any A/, we might as well take the limit 
A/ — > oo. Then, because in the limit Af oo the probability 
mass function sm is concentrated more and more around 

q* = argmin Fciq), 
qenc 

and because the size of Qm grows at most polynomially in 
M, it follows that in the limit A/ ^ oo the sum in dTOl i can 
be simplified, and we obtain 

T] = mean(q*). 
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Moreover, using (fTST l. the expression in ( fTSl i can be rewritten 
for finite AI to read 



Zg = M «=^P ( ~ ^'^ • ^G(g) + o(M)). 

y qeQM 

Taking the limit M — > oo, and using the fact that the size of 
Qm grows at most polynomially in M, we can write 



Zq = cxp ( - Fciq*)) = cxp - argmin FQ{q) 

qenc 



log(^G) 



argmin Fciq). 

qenc 



We conclude this section with some remarks. 

• Note again that the only "non-trivial" step in the above 
derivation was step (a) in (|9]l. 

• Recall that we assumed that T = 1; however, the results 
in this section can easily be generalized to T G M>o- 

• We could have started this section by defining 



Pa (a) = 



Zg ' 



and then we could have continued by replacing C by A, 
Uc by Ua, etc. (Clearly, Pa (a) = Pc(a) if a e C and 
PA(a.) = if a ^ C) Both approaches, the approach 
taken in this section and this alternative approach, have 
their advantages and disadvantages, but ultimately they 
yield equivalent results. 

V. The Local Marginal Polytope and 
THE Bethe Approximation 

In many problems it is desirable to compute the Gibbs par- 
tition function of some graphical model. However, the direct 
evaluation of (01) is usually intractable. Moreover, although 
the above reformulation of the Gibbs partition function via 
the minimum value of some function, see is an elegant 
reformulation of the Gibbs partition function computation 
problem, this does not yield any computational savings yet. 

Nevertheless, it suggests to look for a function that is 
tractable and whose minimum is close to the minimum of 
the Gibbs free energy function. An ansatz for such a function 
is the so-called Bethe free energy function |T|. The Bethe free 
energy function is interesting because a theorem by Yedidia, 
Freeman, and Weiss Q says that fixed points of the sum- 
product algorithm correspond to stationary points of the Bethe 
free energy function. (For further motivations for the Bethe 
approximation we refer to the discussion in 13], ll57l . ISSl .) 

Before we can state the definition of the Bethe free energy 
function, we need the concept of the local marginal polytope. 

Definition 13 Let N{J',£,A,g) be an NFG and let 

/3 = ((/3/)/e^, meee) 
be a collection of vectors based on the real vectors 



Then, for / € J^, the fth local marginal polytope (or fth 
belief polytope) Bf is defined to be the set 

and for e £ £, the eth local marginal polytope ( or eth belief 
polytope) Be is defined to be the set 

With this, the local marginal polytope (or belief polytope) B 
is defined to be the set 



B= < 



/3 



f3f e Bf for all f e F 
f3e G Be for all e ^ £ 



J2 f^f'°-'f ~ 
for all f £ F, e £ Ef, ae £ Ae 



where (3 Cz B is called a pseudo-marginal vector, or more 
precisely, a locally consistent pseudo-marginal vector The 
constraints that were listed last in the definition of B will 
be called "edge consistency constraints." □ 

Definition 14 For any temperature T G K^o. the Bethe free 
energy function associated with some NFG N{F,£,A,g) is 
defined to be the function (see HS]!) 

Fb-.B^R, f3^UBif3)-T-HBif3), 

where 

f 

Hb-.B^R, (3^J2^^-f(f^f^- J2 HsAf^e), 
with 

Ubj -.Bf^R, f3f^ •log(5/(a/)), 

af 

Hbj -.Bf^R, f3f^-J2 • log (/3/,a,) , 

HB,e-Be^R, /3e ^ - ^ /3e,a, • log (/^e.a, ) • 

Here, Ub is the Bethe average energy function and Hb is the 
Bethe entropy function. □ 

Note that in the above definition of Hb{P), the term 
^B,e(/3e) appears with coefficient —1 for full-edges e G ffuii, 
whereas it appears with coefficient for half-edges e € fhaif ■ 
Therefore, the latter terms are omittedH 



^These coefficients are consistent with the coefficients in (5]. Namely, 
because half/full-edges in NFGs correspond to variable nodes of degree 
one/two in factor graphs 1521 . and because the Bethe entropy function term 
corresponding to a degree-d variable node in a factor graph appears with 
coefficient — (d — 1) in the Bethe entropy function definition, we see that for 
a full-edge the corresponding coefficient must be —(2 — 1) = —1, and that 
for a half-edge the con'esponding coefficient must be —(1 — 1) = 0. 
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Definition 15 For any T S M>o, the Bethe partition function 
associated with some NFG N{J-,£,A,G) is defined to be 

Zb = cxp ^-^ • minFB(^)^ . 

□ 

Let us comment on a variety of issues with respect to the 
above definition of the Bethe partition function. 

> Note that here Zb is defined such that a similar statement 
can be made as in Lemma [8] 

• For NFGs without cycles one can show that Zb = Zq. 

• The Bethe free energy function of NFGs with cycles can 
have non-global local extrema, which is in contrast to the 
Gibbs free energy function which is convex and therefore 
has no non-global local extrema. 

• Similar to Lemma [TO] we can consider a modified 
Bethe free energy function that equals the Bethe free 
energy function for /3's that are compatible with some 
o-haU G ^haif, and that is infinite otherwise. Of course, 
the modified Bethe partition function will be a function 

of Ohalf e AhalS- 

In the next section, we will present an alternative charac- 
terization of the local marginal polytope, namely in terms 
of so-called finite graph covers, thereby generalizing some 
observations that were made in H, Q. This will then lead the 
way to Section lVlIl where we will show that the Bethe entropy 
function, and consequently also the Bethe partition function, 
cannot only be characterized analytically (as was done here in 
Definitions [14] and [TSl ). but also combinatorially. This combi- 
natorial approach is based on counting valid configurations in 
finite graph covers of the underlying NFG. 

VI. Finite Graph Covers 

This section reviews the concept of a finite graph cover of a 
graph; for more details we refer the interested reader to 1591 . 
We also refer to Q, IS), Il60l - ll62l . where finite graph covers 
were used in the context of coding theory, especially for the 
analysis of linear programming decoding and message-passing 
iterative decoding. 

Definition 16 (see, e.g., ll59ll . ||63]| ) A cover of a graph G 
with vertex set V and edge set £ is a graph G with vertex set 

V and edge set £, along with a surjection tt : V — > V which 
is a graph homomorphism (i.e., tt takes adjacent vertices of 
G to adjacent vertices of Q) such that for each vertex v G V 
and each v G n~^{v), the neighborhood d{v) of v is mapped 
bijectively to d{v). A cover is called an M-cover, where 
M g Z>o, if |7r"^(w)| = M for every vertex v in vJl □ 

A consequence of this definition is that if G is an M- 
cover of G then we can choose its vertex set V to be 

V = V X [M]: if (d,to) S V then TT[{v,m)) = v and 
if ((wi,mi), (w2,m2)) G £ then 7r({(wi, mi), (w2, m2)}) = 
{vi,V2}- Another consequence is that any M2-cover of any 
Afi -cover of the base graph is an (M2 • -Mi)-cover of the base 
graph. 

^The number AI is also known as the degree of the cover. (Not to be 
confused with the degree of a vertex.) 




Fig. 5. Top left: base graph G. Top right: sample of possible 2-covers of G. 
Bottom left: a possible 3-cover of G. Bottom right: a possible M-cover of G. 
Here, a^-^, . . . , cre^ are ai'bitrary edge permutations. 




9/4.2 9/5,2 
Fig. 6. Two possible 2-covers of the NFG N that is shown in Fig. [3] 

Example 17 (Q) Let G be a (base) graph with 4 vertices 
and 5 edges as shown in Fig.\5}(top left). Figs.\5}(top right), |5] 
(bottom left), and\5\(bottom right) show, respectively, possible 
2-, 3-, and M-covers of G. Note that any 2-cover of G must 
have 8 = 2-4 vertices and 10 = 2 • 5 edges, that any 3-cover 
of G must have 12 = 3 • 4 vertices and 15 = 3-5 edges, and 
that any M-cover must have M ■ 4 vertices and M ■ 5 edges. As 
depicted in Fig. \5\(bottom right), any M-cover of G is entirely 
specified by \£\ edge permutations, where £ is the edge set of 
G. □ 

As we can see from this example, an M-covei G of G may 
consist of several connected components also if G consists of 
only one connected component. In general, letting #G and 
#G denote the number of connected components of G and G, 
respectively, one can easily verify that 

#G < #G M.#G. (14) 

Because NFGs are graphs, we can also consider finite graph 
covers of NFGs, as is done in the next example. (Note that 
we do not apply edge permutations to copies of half-edges.) 
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^ii h2 

Fig. 7. Hierarchy of all finite graph covers of the (base) NFG N from Fig. [3] where the A/th level lists all graphs in A/a/, i.e., all A/-covers. (For a base 
NFG with l^fuiil full-edges, there are | A/"mI = {A'/!)l^fuiil graph covers at the Mth level.) The pseudo-marginal mappings tpM, M S Z>o, and their images 
are specified in Definitions 1221 and 1241 



Example 18 Consider again the NFG N that is discussed in 
Example |7] and depicted in Fig. \3\ Two possible 2-covers of 
this (base) NFG are shown in Fig. |6] The first graph cover is 
"trivial" in the sense that it consists of two disjoint copies 
of the NFG in Fig. \3\ The second graph cover is "more 
interesting" in the sense that the edge permutations are such 
that the two copies of the base NFG are intertwined. (Of 
course, both graph covers are equally valid.) 

Note that the M copies of a function node gj are denoted 
by {9f,m}m£[M]> o.'^d that the M copies of a variable label 
Ae are denoted by {^e,m}me[A/]- In that respect, we chose the 
variable labels to be such that if a full-edge e connects function 
nodes fi and fj, i < j, then the variable label A^^m, m G [M], 
will be associated with the edge that connects the function 
nodes {fi,m) and (/j, (Te(m)), where Ue '■ [M] — [M] 
describes the permutation that is applied to the M copies 
of the edge e. Similarly, if a half-edge e is connected to the 
function node f then the variable label A^.m, fn € [M], 
will be associated with the half-edge that is connected to the 
function node (/, m). That being said, it is important to note 
that the results that are presented in this paper are invariant 
to the chosen labeling convention. □ 

Definition 19 Consider an NFG N(J', £, A, Q). We define the 
set Mm to be the set of all M-covers N o/ N. □ 

Note that in this definition we consider only labeled A/-covers 
as follows: 

• All vertices of an M-coverhave distinct labels, say (/, m) 
or .g/^,„ with (/,m) € J" x [M\. 

• All edges of an M-cover have distinct labels, say (e, m) 
or ^e,m with (e,m) S £" x [M]. 

• We do not identify M-covers whose graphs are isomor- 
phic but whose vertex labels are distinct. 

• However, we do identify M-covers whose graphs (includ- 
ing vertex labels) are isomorphic but whose edge labels 
are distinct. 



(For reasons of simplicity, these vertex and edge labels are 
sometimes omitted in drawings.) These conventions are re- 
flected in the following lemma that counts M-covers. 

Lemma 20 Consider an NFG V\{F,E,A,Q). Then 

|AAm| = (Af!)l^'""(^'l . (15) 

Proof: An M -cover N of N can be obtained as follows. 

1) For every / e -7^(N), draw M copies of / (with distinct 
labels). 

2) For every e = {/, /'} G £tuii(N), connect the M copies 
of / and the M copies of /' by AI disjoint edges. 

3) For every e = {/} e fhaif(N), attach a half-edge to the 
M copies of /. 

Because the second step can be done independently for every 
e G ^^fuii(N), because for every such edge there are M\ ways 
of connecting the M copies of / and the M copies of /' by 
M disjoint edges, and because the obtained M-covers are all 
distinct, the result follows. ■ 

Example 21 Consider again the NFG N that is discussed in 
Example [7] and depicted in Fig. \3\ We can order all the finite 
graph covers of this base NFG according to the hierarchy 
shown in Fig. where the Mth level lists all graphs in Mm, 
i.e., all M-covers. (Note that there is exactly one l-cover, 
namely the base NFG itself.) The inverted pyramid alludes 
to the fact that the number of M-covers is growing with M, 
i.e., \Mm\ is growing with M, see (115b . □ 

The following definition specifies a collection of mappings 
that will be crucial for the rest of this section and for the next 
section. These mappings are inspired by similar mappings that 
appear in the method of types, see Definition fTTl 

Definition 22 Lef N{T,£,A,g) be an NFG with local 
marginal polytope B ( see Definition 1731 ). For any M e Z>o 
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we define the pseudo-marginal mapping 

: {(N, c) I N e Mm, c e C(N)} ^ B, 
(N, c) ^ (3. 

Here, for a given N G Mm ond c € C(N), the components of 
(3 are defined as follows 

f3f,af = T7 H = / e -^^ a/ e (16) 



9/1,1 A 



9/2,2 



9/a,.i Ae 



iG[M] 



(17) 



me[J\/] 



f/« f/ie above expressions we have used Iverson's convention 
that was defined in Section \I-G\ ) □ 

Note that one cannot mention a valid configuration c with- 
out mentioning the A/-cover N in which it lives. Therefore, 
although the expressions in ( fTSI l and ( fTTI i do not involve 
the A/-cover N explicitly, the domain of cpM must be over 
(graph, valid configuration)-pairs (N,c), and not just over 
valid configurations c. 

Example 23 Consider again the NFG N that is discussed 
in Example |4] and depicted in Fig. |4] which goes back to 
Example Q] and Fig. \3\ A possible 2-cover N o/ N is shown 
in Fig. U] Applying the pseudo-marginal mapping to the valid 
configuration c G C(N) shown in Fig.^ we obtain the pseudo- 
marginal vector (3 = ipm{c) with the following components. 
(We show only a selection of the obtained pseudo-marginals.) 

• For Pf^^ai with ai = {ae^.a^^^ae^): 

/3/i,(ooo) = 1, /?/i,(ooi) =0, /3/i,(oio) =0, /3/i,(oii)=0, 
%,(ioo)=0, /3^j_(ioi) =0, /3y^_(iio)=0, /3^^_(iii)=0. 

• For ;5/2_a2 with a2 = {ae2,ae3,aee)- 

/^/2,(ooo) =2 ' /^/2,(ooi) =0) /3/2,(oio) =0, /3/2,(oii)=2' 
/^/2,(ioo) = O7 /3/2,(ioi) =0) /3/2,(iio)=0, /3/2,(iii) = 0. 

• For /3/5,a5 with = (ae^jaej; 

/5/5,(oo) = 2' /5/5,(oi) = 0, /3/5,(io) = 0, /3/5,(ii) = -. 



ForPe 



For^e 



Pei,0 — 0; /3e4,l — 1- 



1 1 

Pe7,0 — 2' Pe7,l — 2" 



□ 



The upcoming Theorem[25]will show that the local marginal 
polytope ;B is a valid choice as a co-domain of the pseudo- 
marginal mapping ifM- For this theorem, the following defi- 
nition is useful. 




9/4,2 



9/5,2 



Fig. 8. Valid configuration c on a possible 2-cover of the NFG N in Fig. |3] 
(This 2-cover is identical to the second 2-cover in Fig.|6]) For every (e, m) g 
£ X [M], if Ce,m = then the edge (e,m) is thin and in black, whereas 
if Ce,m = 1 then the edge (e,m) is thick and in red. (See Example 1231 for 
more details.) 



Definition 24 Consider an NFG N{T,£,A,g). For every 
M G Z>o, we define B'j^j to be the image of the pseudo- 
marginal mapping (fiM, i-C-, 

B'ni = image((pM)- 
Moreover, we define B' to be the union of all B'j^j, i.e., 

B'^ [J B'm- (18) 



Mel 



□ 



In words, B' is the set where for every f3 £ B' there is 
some M G Z>o such that there is an A/-cover N G 7Vj\/ with 
a valid configuration c G C(N) in it such that c maps down to 
f3 under the pseudo-marginal mapping (pM- Generalizing the 
language of 1611 , we will call 

B'j^j : set of all M-cover lift-realizable ps.-marg. vectors, 
B' : set of all lift-realizable pseudo-marginal vectors. 

For any Mi,M2 G Z>o with Mi dividing A/2, one can 
show the following chain of set inclusions 

B'm, c B',t^ c B\ 

The second set inclusion follows from (fTSl i: we leave it as an 
exercise for the reader to verify the first set inclusion. 

Theorem 25 Lef N{T,£,A,g) be an NFG with local 
marginal polytope B. The set of all lift-realizable pseudo- 
marginal vectors satisfies 

B' = BnQ'^"''^^\ 

which implies that B' is dense in B. Moreover, all vertices of 
B are in B'. 

Proof: This is a more or less a straightforward extension of 
the characterization in ID, IS) of the fundamental polytope in 
terms of valid configurations in finite graph covers. We omit 
the details. ■ 

Let us conclude this section with a few comments. 

Remark 26 Lef N{J^,£,A,g) be an NFG with local 
marginal polytope B. For any M G Z>o, consider the set 
of a all M-cover lift- realizable pseudo-marginal vectors B'j^j. 
• It holds that 

\B'j,j\ < (A/ + l)^™(^). 
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This follows from the observation that every component Definition 27 Let ^{F ,£ ,A,G) be an NFG. For any M £ 



of f3 € B'j^j takes values in the set {jji jjt jj, ■ ■ ■ t jj}- 
Because dim(S) is a fixed number for a given NFG N, it 
follows that the number of elements ofB'j^j grows at most 
polynomially in M. This important fact will allow us to 
use the method of types in the next section. ( Compare this 
observation also with a similar statement in Lemma [12\ ) 
Although the focus of this paper is mostly on the behavior 
of B'j^j when M goes to infinity, the set B'j^j for M = 1, 
i.e., the set B'l, is also of special interest. The reason 
for this is that com/^B'i) contains all pseudo-marginal 
vectors that are globally realizable. Here, a pseudo- 
marginal vector (3 is called globally realizable 45(SI/ when 
there is a p € Hq such that f3 contains the true marginals 
of p, i.e., 

Pf-'^f = X] f af e Af, 



c: Ce — ae 



For the NFG N discussed in Examples\4\and \23\ one can 
verify that the local marginal polytope of N satisfies 



B D conv(S;), 



(19) 



i.e., B is strictly larger than coiiv(i3j). This can be shown 
as follows. Consider the valid configuration c of the 2- 
cover shown in Fig. |3} its associated pseudo-marginal 
vector (3 does not lie in coi\y{B'i). Indeed, because all 
variable alphabets are {0, 1} and because /3e.o = 1 — /3e,i 
for all e G £, one can verify that the condition that the 
vector (3 is in conv{B[) is equivalent to the condition 
that the vector 



62, ll ■ 



,/3e„l) 



1 2 1 1 1 

2' 2' 2' 2' 2' 2' 2' 2 



is in the convex hull of the set C of valid configurations 
of N as listed in ([3]l- However, the latter is not the case. 
Therefore, conv(S2) 2 conv(SJ). Combining this with 
B 3 conv(S2), we find that the B satisfies ( 1 191 ). In 
conclusion, valid configurations like c in Fig. \8\ are the 
reason why B is strictly larger than conv{B[). □ 



VII. Counting in Finite Graph Covers 

The definition of the Bethe entropy function and the Bethe 
partition function in Definitions [14] and [15] respectively, 
were entirely analytical. In this subsection we will present 
a combinatorial characterization of these functions in terms 
of counting certain valid configurations in graph covers, a 
characterization that was first outlined in li64ll . Ii65]| . 

We start with the definition of a certain averaging operator 
This definition is motivated by the fact that many results in 
this section are based on associating a real number to every 
7\/-cover of a base NFG and on computing the average of this 
value over all A/-covers. 



Z>o and any function xm 
operator to be 



XA/(N) 



M 



we define the averaging 



M 



E 

NeTV'w 



□ 



A. The Bethe Entropy Function 

The next definition introduces the function that will be key 
towards the promised combinatorial characterization of the 
Bethe entropy function. 

Definition 28 Let n{F,£,A,Q) be an NFG and let B' be 

its set of all lift- realizable pseudo-marginal vectors. Then, for 
every M E Z>o and every /3 € B' we define 



XM,I3 ■ J^M 



N 



{ceC(N) I cpm(N,c) =/3} 



(20) 

□ 



Note that for an A/-cover N of N the value of XM.pi^) 
represents the number of valid configurations in N that map 
down to /3. Consequently, 



^M(/3) = (xAf,/3(N))^ 



(21) 



is the average number of valid configurations that map down to 
/3, where the averaging is over all A/-covers of N. (Observe 
that this is the same Cm{(3) as in Section |l]) Letting 
denote the inverse of the mapping y>j\f, the quantity Cm {(3) 
can also be written in terms of the pre-image of f3 under the 
mapping tpM, 



Ca/(/3) 



M\ 



Lemma 29 Let N{J^,£,A,G) be some NFG, and for every 
M g Z>o let B'j^j be its set of all M-cover lift-realizable 
pseudo-marginal vectors. Then for every (3 € B'j^j we have 



CMif3) = n 



M 




M ^ 
M ■ (3,, 



(22) 



(23) 



where we have used the multinomial coefficients 
M \ ^ M\ 
,M-f3f) n„,m.a,)!' 
M \ ^ M\ 

,M-f3j ~T\ZwZJ-' 

(Note that the components of M ■ (3 are non-negative integers 
and so these expressions are indeed well defined.) 

Proof: See Appendix lAl ■ 

Note that the multinomial coefficients that appear in the 
above expression for Cni{f3) have different origins. Namely, 
the multinomial coefficients in the numerator of Cm {13) stem 
from counting locally valid configurations at the function 
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nodes (see the proof of Lemma [29] in Appendix |A] for 
the definition of "locally valid configurations"), whereas the 
multinomial coefficients in the denominator of Cm{I3) stem 
from counting the number of edge connections that lead to 
overall valid configurations, and from the division by the total 
number of M-covers. 

The next theorem states the first main result of this section 
(and also of this paper). It connects the asymptotic behavior of 
Cm{I3) with the Bethe entropy function value of /3. Therefore, 
this result gives the promised combinatorial characterization 
of the Bethe entropy function value of (3. (The second main 
result of this section will be the combinatorial characterization 
of the Bethe partition function presented in Theorem [33]) 

Theorem 30 Let N ( J", A, G) be some NFG and let B' be its 
set of lift-realizable pseudo-marginal vectors. For any (3 Cz B' 
we have 

limsup ^ log {Cm {f3)) = HB{f3). 

Proof: There are infinitely many M E Z>o such that /3 e B'j^j. 
This can be seen as follows. Namely, by definition of B', there 
must be at least one M* e Z>o such that /3 G B'j^^,. However, 
because /3 £ B\.jt implies that (3 E B'j^.j holds for any M E 
Z>o that is divisible by M*, there are in fact infinitely many 
M € Z>o such that (3 G 6^^. The theorem statement then 
follows by combining Lemma l29] the results 



) = exp (^^M ■ ^ /3/,„,, log {13 f^^^) + o(M) j 

exp (^-AI ■ Pe^a^ log (/3e,ae) + o(A/)^ , 



M 
M-(3f 

M 
M-f3, 

for /3 e B'j^j (which are consequences of Stirling's approxima- 
tion of the factorial function), and the definition of the Bethe 
entropy function in Definition [14] ■ 

A straightforward consequence of Theorem [30] is that for 
/3 e B'j^f we have 



Cm(/3) = exp {M ■ HB{f3) + o{M)). 



(24) 



Therefore, the Bethe entropy function value of (3 has the 
meaning of being the asymptotic growth rate of the average 
number of valid configurations in A/-covers that map down to 
(3, where the averaging is over all M-covers of N, and where 
asymptotic is in the sense that M goes to infinity. 

At this point, we encourage the reader to compare the 
observations that were made so far in this section with 
similar statements that were made in Lemma [12] with respect 
to Cm{<i)- There are many similarities, but also some key 
differences. One key difference is the following: 

• In the setup of the present section, we count the aver- 
age number of certain valid configurations in A/-covers 
of some NFG N. Most importantly, every A/-cover is 
obtained by suitably "intertwining" M independent and 
identical copies of N. 

• In the setup of Section [IV] we count certain valid con- 
figurations in N^^, which corresponds to counting certain 



valid configurations in the A/-cover of N that consists of 
M independent and identical copies of N0 

In conclusion: when counting certain valid configurations in 
"intertwined" AZ-covers we get the Bethe entropy function, 
whereas when counting certain valid configurations in "non- 
intertwined" AZ-covers we get the Gibbs entropy function. 
(See also the comments at the end of the upcoming Sec- 
tion IVII-DI ) 



B. The Bethe Average Energy Function 

In this subsection we show how the global function of an 
M-cover of some base NFG can be expressed in terms of the 
Bethe average energy function of this base NFG. 

Theorem 31 Let H{F,£,A,g) be some NFG and let M e 
Z>o. Then for any M-cover N o/ N and any c g C(N) we 
have 

-^\ogg^{c)^UB{f3)\ 

IVl l3=ipM(N,c) 

(Note that this expression does not involve a limit M — >■ oo.j 
Proof: Let T be the set of function nodes of N. Then 

Taking the logarithm on both sides, multiplying both sides by 
— and using the definition of the Bethe average energy 
function (see Definition [141) . we obtain the expression stated 
in the theorem. ■ 

Recall the definition of globally realizable pseudo-marginal 
vectors from Remark [26] One can easily show that for every 
/3 e coiTv{B[) it holds that 

UBif3) = Ug{p), 

where p E Hq is the distribution whose marginals are given 
by /3. In this sense, the Bethe average energy function can 
be seen as a "straightforward extension" of the Gibbs average 
energy function from the domain conv(S^) to the domain B. 

C. The Bethe Free Energy Function 

An immediate consequence of Theorems [30] and [3T] is that 
for any temperature T E ]R>o, any M E Z>o, any il/-cover 
N of N, and any c E C(l\l) it holds that 

g^icy/^ ■ Cm{P) = exp (^-^ • Fb(/3) + o(M)^ , (25) 

where /3 = <pj\/(N, c). 

"^More precise would be ". . . which corresponds to counting certain valid 
configurations in one of the M-covers of N that consists of M independent 
and identical copies of N." 
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D. The Degree-M Bethe Partition Function 

The above developments motivate the following definition 
of a degree-Af Bethe partition function, which, as we will 
show, has the property that in the limit il/ — > cxd it converges 
to the Bethe partition function. Note that in contrast to the 
definition of the Bethe partition function in Definition [15] 
which was analytical, the definition of the degree-7\/ Bethe 
partition function is combinatorial. 

Definition 32 Let N {F, E,A,Q) be an NFG. For any temper- 
ature T G R>o and any M G Z>o, we define the degree-M 
Bethe partition function to be 



Therefore, the degree-2 Bethe partition function is 



Zg{N) 



(Note that the right-hand side of the above expression is based 
on the Gibbs partition function, see (|4|l, and not on the Bethe 
partition function.) □ 

From the above expression we see that the degree-M Bethe 
partition function is defined to be the Afth root of the average 
Gibbs partition function, where the averaging is done over all 
M-covers of N. 

With this we are in a position to formulate the second main 
result of this section (and of this paper). 

Theorem 33 For any NFG N(J-,£^A,G) and any tempera- 
ture T G R>o it holds that 

limsup Zb,m(N) = Zb(N). 
Proof: See Appendix IbI ■ 

Example 34 For improving one's understanding of Z^.m, it 
is helpful to explictly compute this quantity for small NFGs 
and small values of M. To this end, consider the NFG N in 
the lower left corner of Fig. |9] Assume that the variable alpha- 
bets and local functions are defined analogously to variable 
alphabets and local functions of the NFG in Example^ (The 
same NFG was also discussed in ^ Example 29] and in MQI 
Example 2.4].) 

One can easily verify that all valid configurations take on 
the global function value one. Moreover, because N does not 
have half-edges, the set {e ^ £ \ 0^ — 1} associated with a 
valid configuration c forms a cycle or an edge-disjoint union 
of cycles in N. With this, the set C(N) of valid configurations 
contains four elements, as shown in the last row ofFig.^ i.e., 

^g(N) = 4. 

Because N has seven edges, there are 2^ — 128 distinct 2- 
covers: 32 of them are (when omitting the cover-related parts 
of the vertex and edge labels) isomorphic to N2,i, 32 of them 
are isomorphic to N2,2, 32 of them are isomorphic to N2,3, 
and 32 of them are isomorphic to N2,4 shown on the left-hand 
side of Fig. [9] One can verify that 



Zb.2(N) = 2 



1 

128 



he [4] 



32Zg(N2,/i) = 3.162. 



Some comments: 

m The 2-cover N2,i consists of two copies of N and con- 
sequently we have Zg(N2,i) = {Zq{N)Y = 4^ = 16. 
//Zg(N) (Zg(N))^ were true for all 2-covers of N, 
f/ie« Zb,2(N) = Zg(N). 

• As we can see from Fig. |9] there are 2-covers N such that 
Zq{H) 7^ (Zg(N)) . Therefore, it is not surprising that 

^B,2(N) ^ Zg(N). 

• If all 2-covers were like N2,/i, h £ [3], then the set 
of 2-cover lift- realizable pseudo-marginal vectors would 
satisfy conv(S2) = comf[B'i). However, the 2-cover N2,4 
contains some configurations whose associated pseudo- 
marginal vector does not lie within cotiy{B'i). Therefore, 
conv(yB2) 2 conv(SJ), and so, because B 3 conv(S2), 
we literally see why the NFG N is an example where 
the local marginal poly tope satisfies B 2 c,owr{B'i). (See 
Remark \26\ for a related observation.) 

• As mentioned in Section \I-E\ one can also give a combi- 
natorial characterization of the Bethe partition function 
of an NFG N in terms of computation trees and the 
universal cover N of N. However, in many respects, 
finite graph covers are easier to deal with, and, as this 
example shows, many effects that are responsible for the 
similarities and differences between Zb(N) and Zq{H) 
are already visible infinite graph covers with small cover 

M. □ 



As the following lemma shows, it is no coincidence that 
Zb,2(N) is a lower bound of Zg(N) for the NFG N in 
Example [35] Let #N be the number of connected components 
of N, when N is considered as a graph. 

Lemma 35 Consider an NFG N as defined in Example 
in particular without half-edges. For any M G Z>o it holds 
that 

2-((A/-l)/M).#N . ^^(N) ^ Zb,m(N) Zg(N), 

2-#^-Zg(N) ^ Zb(N) sC Zg(N). (26) 

Equivalently, 



Z^Mm ^ Zg(N) ^ 2«*^-i)/^^)-#^.Zb,m(N), 



Zb(N) ^ Zg(N) 2#^-Zb(N 



(27) 



Proof: Because Zq{H) equals the number of cycles and edge 
disjoint unions of cycles of N, we get 



where 



circ(N) = |5(N)| - |J-(N)| +#N 



(28) 



(29) 



is the circuit rank of N. Similarly, for any A/-cover N of N 
we obtain 



^G(N2a) - 16, Zg(N2,2) - ^g(N2,3) - ^g(N2 



Zg(N) 



c(N) 



(30) 
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where 

circ(N) = |f(N)| - |JP(N)| +#N. (31) 

From straightforward graph-theoretic considerations of M- 
covers, in particular ( fT4b . it follows that 

|£:(N)|=A/.|£(N)|, (32) 
|J-(N)| =A/.|J-(N)|, (33) 
#N #N < il/ • #N. (34) 

Combining (|29ll, ([H), and (|32]i-(|34li. we obtain 

M ■ circ(N) - (A/-1) • #N < circ(N) M ■ circ(N). 

Then, with the help of (|28l l and (|30] l, we get 

. ^ZomY' ^ ^g(N) < {ZomY'. 

Plugging these expressions into the definition of 2^b,m(N), 
see Definition [32] yields the result that was promised in the 
lemma statement. ■ 

We conclude this subsection with a few comments and 
observations. 

• Of course, the complexity of computing #N and circ(N) 
is polynomial in |J^(N)| and |£(N)|; therefore, the com- 
plexity of computing Zq{H) in Lemma [35] is polynomial 
in |J^(N)| and |£(N)|. Moreover, computing the lower and 
upper bounds in ([27] l is equally complex. The relevance 
of Example [34] and Lemma [35] is therefore not that 
an intractable partition function Zq{H) is approximated 
by lower and upper bounds based on a tractable Bethe 
partition function Zb(N), but to exhibit an example where 
explicit computations can easily be done and insight be 
gained into the formalism presented in this section. 

• The inequalities in ( [26] l and ( [27] | can also easily be 
obtained by minimizing the Bethe free energy function 
and using the expression in Definition [15] However, using 
the combinatorial characterization of Zb(N) gives us 
additional insights why ([26] l holds. In fact, analyzing 
the proof of Lemma [35] we see that these inequalities 
are a straightforward consequence of the graph-theoretic 
inequalities 

#N #N M-#N 

that hold for any A/-cover N of N, see also ( [T4l i. 
. Note that proving the inequality Zg(N) ^ [Zq{U)Y' 
for every A/-cover N of N, as was done in the proof 
of Lemma [35] was also at the heart of the recent proof 
by Ruozzi gS) of the inequality Zb(N) ^■g(N) for 
log-supermodular graphical models N. (This verified a 
conjecture by Sudderth, Wainwright, and Willsky |[46l .) 
Moreover, the paper ||4TI presented setups where the 
inequality Zg(N) ^ {Zg{U)Y' holds for every M- 
cover N of an NFG N whose partition function represents 
the permanent of a non-negative matrix, and pointed 
out setups where this inequality is conjectured to hold. 
Further NFGs where this inequality is conjectured to hold 
were listed by Watanabe ll44l . 



^b.a/(N)|,,^^=Zb(N) 

^B,A/(N) 

^B,A/(N)|,,^, =Zg(N) 

Fig. 10. The clegree-_A/ Bethe partition function of the NFG N for different 
values of M. 



• When considering the value of Zb,m(N) from 7\/ = 1 to 
M = oo, one goes from Zq{U) to Zb(N), see Fig. [TO] 
It is worthwhile to consider the inequalities that appear 
in Lemma [35] under this perspective. 

• We can write the ratio Z'b(N)/Zg(N) as the following 
telescoping product 

^b(N) _ ^. -j-r Zb,M'+i(N) 

Zg(N) mToo^^II^j Zb,m'(N) ■ 

Towards a better understanding of the ratio 
Zb(N)/Zg(N), it might therefore be worthwhile to 
study the ratios Zb,a/+i(N)/Zb,m(N), M e Z>o. We 
leave it as an open problem to see if general statements 
can be made about them. 

• Let be the subset of A/"^/ that contains all M -covers 
N that consist of M disconnected copies of N. It holds 
that Zg(N) = (Zg(N))^^ for any N e TV"]^, and so, 
trivially, 

^ V(Zg(N))^^^,,. 

• If N does not contain any cycles, then 

>o, 

and so Zq{H) = Zb(N). 

• If N does contain cycles then 

Because usually Zg(N) ^ (Zg(N))^^ for N e A/'m \ 
Nil, is surprising that usually ^g(N) ^ 2'b(N) 
for an NFG N with cycles. 

E. Similarities and Differences w.r.t. the Replica Method 

In this subsection we discuss similarities and differences 
between, on the one hand, the concepts and the mathematical 
expressions that have so far appeared in this section, and, on 
the other hand, concepts and mathematical expressions that 
appear in the replica theory (see, e.g., 1341 . Il35l Chapter 8], 
111] Appendix 1]). 

Let Njv,fl be an NFGs with "size parameter" N whose 
local functions depend on a random variable (or random 
vector) R. Assume that \^n.r represents some physical sys- 
tem. Many interesting physical quantities about this physical 
system can then be derived from the normalized log-partition 
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function log(ZG(N7v.i?))- However, because this expression 
is usually not tractable, one studies the ensemble average 
^[jf log(2'G(NAr , where the expectation value is w.r.t. R. 
If "measure concentration" happens for large N, then the 
normalized log-partition function of the "typical" NFG will 
be close to this expression for large N. 

Direct evaluation of E[-^ log(^G(NAr,i?))] is often not 
possible. Here is the point where the replica method comes 
in. Inspired by the equation 



log(z) 



lim ■ 

A/4.0 



M 



z e M>o, 



where Al is considered to be a real number, the replica method 
proposes the following reformulation of the above expectation 
value 



lim 

A/4,0 



A/ 



- 1 



NM 



One then notices that for positive integers M the term 
{Zq{Nn ji)) , which appears on the right-hand side of the 
above expression, corresponds to considering the partition 
function of M independent copies of N tv,/? (hence the name 
"replica theory"). After evaluating E[(ZG(Njv_fl,)) ] for pos- 
itive integers M, one then drops this requirement on M, and 
evaluates the limit M I 0. This is the gist behind the replica 
method. Much more can, and needs to be said, for which we 
refer to ||34l, 1351 Chapter 8], |l36l Appendix I] (and references 
therein). 

Clearly, there are similarities between the replica method 
and the developments in this paper However, there are also 
some stark differences. 

• The NFG N at depends on a random variable R, whereas 
the NFG N that is studied in this paper is deterministic. 
(Of course, our setup also allows NFGs that depend on 
a random variable, but that is not necessary.) 

• Because the random variable R is the same in all Af 
copies, there is the appearance of some "coupling ef- 
fect" between the M copies of N^.r when evaluating 
E[(ZG(NAr ij)) ]. This is in contrast to the "coupling 
effect" that appears in an M-cover of N as a result of the 
graph-cover construction process where the edges of M 
independent copies of N are permuted. 

• The replica method is based on studying the limit M I 0, 
whereas this paper typically studies the limit M — > 00. 
Moreover, this paper never drops the requirement that M 
is a positive integer. 

Note that in coding theory, when studying the growth 
rate of the average Hamming weight enumerator of a code 
ensemble |f66|. one usually evaluates an expression like 

log(E[ZG(NAr_fl)] ). This quantity can be the same as the 
above-mentioned E[-^ log(ZG(Njv,/f))] but in general one 
can only state that 



ll0g(E[ZG(NAr,fl) 



^ E 



which is a consequence of Jensen's inequality. For more 
information on these types of issues we refer to, e.g., Il67l . 



Let us conclude this subsection by mentioning a recent 
paper by Mori ifJTl that was inspired by an earlier version of 
the present paper and that offers an alternative (and simpler) 
approach to some computations that are done in the context of 
the replica method. For more details we refer to Mori's paper 
See also |i38J- 

VIII. NFGs FOR Channel Coding 

The main purpose of this section is to introduce some 
notation and concepts that will be useful for the next two 
sections, namely for Section |IX] on graph-cover decoding and 
for Section |X] on a connection between the minimum Ham- 
ming distance of a code and the non-concavity of the Bethe 
entropy function of some graphical model that represents this 
code. 

For the following definition, we remind the reader of the sets 
C and Chaif that were specified in Definition[3]and the modified 
Gibbs partition function that was specified in Lemma [Tol 

Definition 36 Let X be some finite set and let Cch be a length- 
n channel code over X, i.e., Cch ^ say that an NFG 

H{J- ,A,Q) represents the code Cch if the following four 
conditions are satisfied. 

• All local functions are indicator functions. 

• For every e G fhaif we have Ae = X. 

• The code Cch is the projection of C to £haif, i-C, 

Cch = Chalf — {(Ce)ee£halt | C G C}. 

• There is a ^ Z>o such that 



tN (for all X e Chalf J, 



i.e., for every x £ Chalf there are valid configurations 
c ^ C whose restriction to fhaif equals x. □ 

Let us comment on this definition. 

• One can verify that the last condition is always satisfied 
in the following important special case: namely the case 
where all edge alphabets are equal to some group and all 
local functions represent indicator functions of subgroups 
of this group. (The proof of this statement uses the fact 
that all cosets of a subgroup have the same size. We leave 
the details to the reader.) 

• Note that in Example |4] for every x e Chalf, there are 
tiM = 4 valid configurations in C whose restriction to £haif 
equals x. 

• Usually, an NFG that represents a code is set up such 
that tN = 1. However, sometimes it is more natural to set 
up N such that > 1. For a more detailed discussion of 
this and related issues we refer the interested reader to, 
e.g., JMl- 

Example 37 Consider the length-lO code Cch over ¥2 defined 
by the parity-check matrix 

1110 1110' 

1110 10 11 
1110 1110 

1 1 1 1 1 1 
1 1 1 1 1 1 
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i.e., Cch — {x £ ¥2 I H x^ = 0^ {in ¥2)}, where vectors are 
row vectors and where ( • )^ denotes vector transposition. This 
code can be represented by the NFG shown in Fig. 1771 (left). 
Here, all edge alphabets are equal to ¥2, all function nodes 
on the left-hand side represent indicator function nodes of 
repetition codes, and all function nodes on the right-hand 
side represent indicator function nodes of single parity-check 
codes. It can easily be verified that for this NFG we have 

tN = 1- □ 

This example is formalized in the following definition. 

Definition 38 Consider a code Cch over ¥2 defined by some 
parity-check matrix H = [hj^i\j^j^ i^i, where J and X are 
the set of row and column indices of H, respectively. The code 
Cch can be represented by an NFG N(i?) = N(J",£,^, Q) as 
follows. 

• The set of local function nodes is J- = I \J J . 

• The set of edges is E = f half U ffuii, where £haif = ^ 
and where Ei^w = G I x | hj^i = l}- 

• For every e G 5, the edge alphabet is Ae = ¥2. 

• For every i € I, the local function gi equals the indicator 
function of a length-{\£i\ + \) repetition code. 

• For every j G J^, the local function gj equals the 
indicator function of a length-\£j\ single parity-check 
code. 

If the parity-check matrix H is such that all columns of 
H have Hamming weight c?l and all rows of H have 
Hamming weight c?r, then H is called a {di^^d^)- regular 
parity-check matrix. (For example, the parity-check matrix H 
in Example \37\ is (3, 6)-regular) If the parity-check matrix H 
is sparsely populated then the code Cch is called a low-density 
parity-check (LDPC) code. Consequently, if the parity-check 
matrix H of an LDPC code is (dj^, dYi)-regular then Cch is 
called a (dL, d^)-regular LDPC code, otherwise Cch is called 
an irregular LDPC code. □ 

The following definition is a generaUzation of the definition 
of the fundamental poly tope and the fundamental cone in ID, 
0. 

Definition 39 Let Cch be a code over ¥2, let V\{F,E,A, Q) be 
an NFG that represents Cch, o.nd let B be the local marginal 
poly tope 0/ N. We define the fundamental poly tope V and the 
fundamental cone K, to be, respectively, 

P^{(/?e.l)e6^..„ I/3GS}, 

/C = conic(7'). 

Elements of V and K, are called pseudo-codewords. □ 

It can easily be verified that conv(Cch) Q V, i.e., that the 
fundamental polytope is a relaxation of the convex hull of 
the set of codewords. (Here the codewords are assumed to be 
embedded in M", where n is the length of the code.) 

The following definition is taken from 1691 . 

Definition 40 Let Cch be a code over ¥2, let N (J^, E,A,Q) be 
an NFG that represents Cch, ond let B be the local marginal 
polytope of N. 



A,, 
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A,, \\ 
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Fig. 11. Left: NFG representing the code Cch in Example |37| Right: normal 
factor graph N(3/) that is discussed in Example 1421 

• Let be the surjective mapping 

( Clearly, in general there are many (3 Cz B that map to 
the same pseudo-codeword in V.) 

• Let '4'bme be the mapping 

^BME-V^B, u}^ argmax 77b (/3), (35) 

where "BME" stands for "Bethe Max-Entropy." This 
mapping gives for each uj € V the (3 among all the 
xjj-pre-images of u) that has the maximal Bethe entropy 
function value. 

• The induced Bethe entropy function is defined to be 

Hb-.V^R, u;^ 77b(*bme('^)). 

(Note that the argument of Hb determines if Hb denotes 
the Bethe entropy function or the induced Bethe entropy 
function.) □ 

IX. Graph-Cover Decoding 

As discussed in Section [H and shown in Figs. [T| and |2] 
graph-cover decoding is a theoretical tool to connect a variety 
of known decoders. In this section, we first review blockwise 
maximum a-posteriori decoding (BMAPD), which will set the 
stage for discussing blockwise graph-cover decoding (BGCD). 
Afterwards, we review symbolwise maximum a-posteriori 
decoding (SMAPD), upon which we introduce symbolwise 
graph-cover decoding (SGCD). These decoders are summa- 
rized in Tables H] and HI] 

Note that blockwise graph-cover decoding was simply 
called graph-cover decoding in [|5] Sec. 4] and that the ex- 
position here is slightly more general than in Q because we 
do not restrict ourselves to binary codes. 

Definition 41 The setup in this section is as follows. (See 
also the upcoming Example |42]) We consider a discrete 
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TABLE I 

Expressions for the pseudo-marginal vectors that appear in a unified formulation of BMAPD, BGCD, SMAPD, and SGCD 
in terms of the global function of the nfg n = n(y) and its finite graph covers. 
(For every M e Z>o, the scalar Z'j^j{y) g R>o is some suitably defined constant.) 





maximum a-posteriori decoding 


grapli-cover decoding 


blockwise 


VNeA/-„,gGC{N) / ^j^^ 


^BGCD(y) A lijn / argmax g^(c)) 


symbolwise 


^BMAPD(^)A 1 ^ ^ 9,(c)-V=m(N,c) 

^'^ NeATj,/ geC(N) 


A/=l 


NSA^jj g6C{N) 



TABLE II 

Expressions for the pseudo-marginal vectors that appear in a unified formulation of BMAPD, BGCD, SMAPD, and SGCD 
in terms of Gibbs and Bethe free energy functions. Here, if /3 e conv(B5^), i.e., f3 is a globally realizable pseudo-marginal vector 

corresponding to p e lie, we define Fg(/3) = Fg{p). (See also Remark[261) 





maximum a-posteriori decoding 


graph-cover decoding 


blockwise 


^BMAPD(^)^ argmin Fg(/3) 

/3econv{B'j) 


T=0 


/^^^'^^(y) = argmin Fb{/3) 
/3eB 


T=0 


symbolwise 


^SMAPD(y)^ argmin Fg(/3) 

/3econv(S'j) 


T=l 


/3S'^'^°{y) = argmin Fb(/3) 
/3eB 


T=l 



memoryless channel with an arbitrary input alphabet X, 
an arbitrary output alphabet y, and arbitrary channel law 
{W{y\x)^ , i.e., the probability of observing the 
symbol y ^ y at the channel output given that the symbol 
X € X was sent is W{y\x). Moreover, let Cch be a block 
code of length n and with alphabet X that is used for data 
transmission over this discrete memoryless channel. We let 
X = {Xi, . . . , Xn) and Y = (Yi,...,K„) be the random 
vectors corresponding to, respectively, the channel input and 
output symbols of n channel uses. We assume that a codeword 
X S Cch is selected with probability Px{x). (Of course, 
Px{x) = for X ^ Cch- J The joint probability mass function 
of X and Y is then given by 

Px,Y{x,y) - Px{x)-PY\x{y\x) = Px{x) ■\{W{y,\xi). 

ie[n] 

For a given channel output vector y = (t/i)ig[n] € 3^", 
consider an NFG N(y) = N{J-,£,A,G) with the following 
properties. 

• For all e G f half we have Ae ~ X. 

• We identify £haif with [n]. 

» We identify {aejeeSt^^i with {a;Jig[„]. 

• In order to take the received vector y into account, some 
function nodes are parameterized by yi, i € [n]. 

• For every codeword x e Cch> there is exactly one valid 
configuration c G C such that the restriction of c to 
^haif equals x. This valid configuration will be denoted 
by c(x). In terms of Definition \36\ this means that we 
impose = 1. (With the necessary care, the results of 
this section can be generalized to NFGs for which there 
exists a constant <n with > 1.) 

• There is some constant 7 £ K>o such that for every 



codeword x G Cch, the global function value of the valid 
configuration c{x) is 

g{cix)) =j-P{x,y). (36) 

□ 

Example 42 Consider again the code Cch from Example \37\ 
(which was represented by the NFG in Fig. [77] (left)), 
along with some discrete memoryless channel with input 
alphabet X = ¥2, output alphabet y, and channel law 
{W{y\x)} y^y ■ Let y G y^ be a given channel output 
vector It can be verified that Fig. \11\ ( right) shows a possible 
NFG N (y) that has the properties as specified in Definition \41\ 

□ 

Potential ties in upcoming "arg max" and "arg min" expres- 
sions are assumed to be resolved in a systematic or random 
manner. 

A. Blockwise Maximum A-Posteriori Decoding 

With the setup as in Definitiongl] let x^^^^^{y) G A"" be 
the decision vector obtained by BMAPD based on the received 
vector y. (Recall that BMAPD is the decision rule that mini- 
mizes the block decision error probability Pr(:E^'^^^^(l^) 7^ 
X).) 

Definition 43 Given a channel output vector y, BMAPD 
yields the decision rule 

X 

= argmax Fx, y (a;, y)- (37) 
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□ 



On the side, we note that if all codewords are selected 
equally likely, i.e., P{x) = -p!^ • [x S Cch], then this decision 
rule equals the blockwise maximum likelihood decoding rule. 

Lemma 44 Given a channel output vector y, consider the 
NFG N = N(y) from DefinitionWH The vector x^^^P^ 
satisfies 

^BMAPDj-y^ = argmax maxg(chaif , Cfun) 



(y) 



and 



C f *BMAPD 



(y)) = argmax g{c) 



In terms of the pseudo-marginal vector ^^^^^^[y) that is 
defined in Table one can therefore write 



-BMAPD/ 



and so 

argmax /3BMAPD(y) 



e e fhalf- 

Proof Follows from ^ and Note that Mi = {N{y)}. 

m 

As shown in the following lemma, the BMAPD rule can also 
be cast as a Gibbs free energy function minimization problem 
(with temperature T ~ 0). 

Lemma 45 Given a channel output vector y, consider the 
NFG N = N(y) from Definition \41\ and define 



p = arg min Fq (p) 
penc 



T=0 



The vector p is such that 
Pc = 



1 ifc=c{x^'^^^^{y)) 



otherwise 

Proof: Follows from Definition |7] Eqs. ( l36b and ( l37b . and the 
fact that Fq{p) = Ug{p) for temperature T = 0. ■ 

From Lemma |45] Remark |26] and the fact that A/i = 
{N(y)}, it follows that 0^^^P^{y) can also be written as 
shown in Table 

B. Blockwise Graph-Cover Decoding 

We consider the setup as in Definition |4T] Recall that 
BMAPD can be seen as a competition of all codewords to 
be the best explanation of the observed channel output vector. 
In this subsection we revisit blockwise graph-cover decoding 
(BGCD), which was originally introduced in 15] Section 4]. 
Actually, we will define this decoder slightly differently than 
in ||5] Section 4]. Namely, whereas in 15] Section 4] all 
codewords in all finite covers of an NFG were competing to 
be the best explanation of a channel output vector, here we 
restrict the competition to all codewords in all i\/-covers of 
an NFG, and then we let AI go to infinity. 



Definition 46 Given a channel output vector y, consider the 
NFG N = N(y) from Definition |57] For any M € Z>o, we 
define degree-Ad BGCD to be the decoding rule that gives 
back the pseudo-marginal vector 



where 



argmax g^{c). 



(38) 



In the limit M — > oo, we define BGCD to be the decoding 
rule that gives back the pseudo-marginal vector 



(This latter expression is also shown in Table 0j 



J\f->oo 



□ 



In the case X = F2, one could have defined BGCD to 
give back the pseudo-codeword (^^^'^'-^ [y)) (with suitable 
generalizations for other alphabets X), however, for simplicity 
of notation we will not pursue this option here. 

Theorem 47 Given a channel output vector y, consider the 
NFG N = N(y) from DefinitionWB Then 



BGCD 



(y) 



arg min i^B(/3) 
f3eB 



T=0 



Proof: This follows from Theorems |25] and |3T] and the fact 
that ^b(/3) = %(/3) for temperature T = 0. ■ 

The decoder relationships that are highlighted in Fig. [T] are 
a consequence of the following observations. 

> Finding the minimum of the Bethe free energy function 

at temperature T = is equivalent to linear programming 

decoding. 

• As shown in Theorem |47] blockwise graph-cover decod- 
ing is equivalent to finding the minimum of the Bethe 
free energy function at temperature T = 0. 

• As discussed in 14), |l5| and in Section H] a locally operat- 
ing algorithm like the max-product (min-sum) algorithm 
"cannot distinguish" if it is operating on an NFG N or, 
implicitly, on any of its covers. (In particular, note that 
the fact that any finite graph cover N of N looks locally 
the same as N implies that the collection of computation 
trees of N equals the collection of computation trees of 
N.) With this, BGCD can be considered to be a "model" 
for the behavior of max-product (min-sum) algorithm 
decoding. Note that the connection between BGCD and 
max-product (min-sum) algorithm decoding is in general 
only an approximate one. However, in all cases where 
analytical tools are known that exactly characterize the 
behavior of the max-product algorithm decoder, the con- 
nection between the BGCD and the max-product (min- 
sum) algorithm decoder is exact. 

Note that if the NFG N does not contain cycles then max- 
product algorithm decoding and linear programming decoding 
yield the same decision as BMAPD 1501 . This is reflected in 
the equivalence of Fq and Fb for cycle-free NFGs, once the 
domains of these two functions have been suitably identified. 
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C. Symbolwise Maximum A-Posteriori Decoding 

With the setup as in Definition [41] let x^^^^^{y) e 
X" be the decision vector obtained by SMAPD based on 
the received vector y. Recall that SMAPD is the decision 
rule that minimizes the symbol decision error probability 
Pr(a;f'^^^^(l^) 7^ X^) for every i e [n] (or, depending on 
the definition, for every i that corresponds to an information 
symbol of Cch)- 

Definition 48 Given a channel output vector y, SMAPD 
yields the vector x^^^^^ {y) with components 

^SMAPD(y) AargniaxPxdy(^«|y) 



where 



= arg max Fx, (-^i , y ) , « S N , 

Xi 

Px„Y{x,,y)^ J2 Px,Y{x',y). (39) 

□ 



X' : X ■ —Xi 



Note that x^^^^'^{y), in contrast to x^^^^^{y), is not 
always a codeword. 

Lemma 49 Given a channel output vector y, consider the 
NFG N ^ N(y) from Definition The SMAPD vector 

^SMAPD(-y-) satisfies 



^SMAPD^y-j = arg max 77e(ae), e & £] 



half • 



(40) 



with 

r]e{ae) = 7?- • giO''), e e £half, fle G A- (41) 

a' : a'^—ae 

In terms of the pseudo-marginal vector 0^^^^^{y) that is 
defined in Table one can therefore write 

aSMAPD 



77e(ae) = (y)' e e fhalf, a, € Ae, 



and 



= argmax (v)^ ^ ^ ^i-if- 

tie 

Proof: Follows from ^ and (l39ll. Note that TVAi {N(y)}. 

■ 

From Lemma |49] it is clear that the main computational 
step towards obtaining the SMAPD vector is to compute 
the marginals {rjeiO'e)} ■ Note that the scaUng 

constant in dTIT l was chosen such that J2a Ve{<^e) = 1 for 
every e G fhaif- Of course, any other positive scaling factor 
works equally well as it entails the same decision in ( |40| |. 

As shown in the following lemma, the SMAPD rule can also 
be cast as a Gibbs free energy function minimization problem 
(with temperature T = 1). 

Lemma 50 Given a channel output vector y, consider the 
NFG N = N (y) from Definition |47] and define 



p = arg min Fq (p) 



Using the notation from Lemma |49] we have 

c: Ce— ae 

Proof: Follows from Definition |7] and Lemmas |8] and |49] ■ 

From Lemmas |49] and |50] Remark |26] and the fact that 
A^i = {N(y)}, it follows that 0^^^^°{y) can also be written 
as shown in Table |II] 

D. Symbolwise Graph-Cover Decoding 

We consider the setup as in Definition |4T| Recall that 
SMAPD is based on computing suitable marginals of the 
global function represented by the NFG N(y). In this sub- 
section we define symbolwise graph-cover decoding (SGCD), 
which was outlined in ll64l . 1651 . Similar to the transition from 
BMAPD to BGCD, where the competition is extended from all 
codewords of N(y) to all codewords in all i\/-covers of N(y), 
when going from SMAPD to SGCD we replace the marginals 
of the global function of N(y) by a suitable combination of 
marginals of the global functions of all M-covers of N(y). 

Definition 51 Given a channel output vector y, consider the 
NFG N = N(y) from Definition gl] For any M € Z>o, we 
define degree-M SGCD to yield the pseudo-marginal vector 
0SGCI){M) (^y-j ^-fi^ components 

PZT''^{y) " rifM^f), feT,afe Af. 
For every e Cz £, the "marginal function" 7?e.A/ is defined by 



r]e,M{ae) 



1 

M 



J2 

n6[A/] 



,M 



e G £, Qe G Ae 



Ve,m,M{ae) — ^ 



^e.m.N(«e) 



Zg(N) 



e e £, m G [M], Gg S Ae 
e e £, m e [M], Ue e Ae 



where 



M\ 



(N)) 



M 



(42) 



(For a motivation of these expressions, see the paragraph after 
this definition.) For every f G T, the "marginal function" 
Vf.M is defined analogously. Moreover, taking the limit M — >■ 
oo, we define SGCD to be the decoder that gives back the 
pseudo-marginal vector 



^SGCD(y) A lij^ ^SGCD(A/)(y) 

A/— >oo 



□ 
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Here are the motivations for specifying the "marginal func- 
tions" as we did in Definition ISTI 

• Fix an arbitrary A/-cover N of N. If we were to find 
the SMAPD estimate of the code defined by N then, 
following Lemma |49] we would have to compute the 
marginals 

• However, given the fact that no A/-cover is more special 
than any other 7\/-cover, we take the average of these 
marginals over all 7\/-covers and obtain the marginals 

Actually, we take a weighted average where the weighting 
factor for N is chosen to be its Gibbs partition function 
Zg(N). 

• From the symmetries of the setup it is clear that for every 
e € 8 and every S Ae the quantity ?7e,m.A/(ae) is 
independent of m £ [M] . Therefore, the definition of the 
marginals 

is somewhat trivial, but notationally and analytically 
useful. 

• The scaling factors were chosen such that the marginals 
sum to 1, when summed over their corresponding alpha- 
bets. (Note that the reformulation of Z^^(N) on the right- 
hand side of ( |42] | follows from Definition [32]) 

Theorem 52 Given a channel output vector y, consider the 
NFG N(2/) from Definition \41\ and define 

/3 = argmin Fb(/3) 

Using the notation from Definition |57] we have 

lim rie.Aiiae) = /3e,a,, e e £, Oe G Ae, 

A/— >oo 

lim iif,M{af) ^ jif^aj, feJ',afeAf. 

A/— ^oo 

Proof: See Appendix ICl ■ 

In the rather special case where Fb has multiple global 
minima (necessarily of equal value). Theorem |52] has to be 
stated somewhat more carefully. Namely, the pseudo-marginal 
vector has to be replaced by a suitable pseudo-marginal 
vector in the convex hull of all pseudo-marginal vectors that 
minimize Fb. 

The decoder relationships that are highlighted in Fig. |2]are 
a consequence of the following observations. 

• As shown in Theorem |52l symbolwise graph-cover de- 
coding is equivalent to finding the minimum of the Bethe 
free energy function at temperature T = 1. 

• As discussed in H, 111 and in Section U a locally oper- 
ating algorithm like the sum-product algorithm "cannot 
distinguish" if is operating on an NFG N or, implicitly, 
on any of its covers. (Again, note that the fact that any 
finite graph cover N of N looks locally the same as N 
implies that the collection of computation trees of N 



equals the collection of computation trees of N.) There- 
fore, SGCD can be considered to be a "model" for the 
behavior of sum-product algorithm decoding. Note that 
the connection between SGCD and SPA decoding is in 
general only an approximate one. However, in many cases 
where analytical tools are known that exactly characterize 
the behavior of the SPA decoder, the connection between 
SGCD and SPA decoding is exact. 
Note that if the NFG N(y) does not contain cycles then sum- 
product algorithm decoding yields the same (pseudo-)marginal 
vector as SMAPD. This is reflected in the equivalence of Fq 
and Fb for cycle-free NFGs, once the domains of these two 
functions have been suitably identified. 

For an NFG without cycles, the meaning of the pseudo- 
marginal functions that are computed by the sum-product 
algorithm is clear {cf. the discussion at the beginning of 
Section IT]), but for an NFG with cycles, the meaning of these 
pseudo-marginal functions is a priori less clear. However, 
combining Theorems [30] and [52] with the theorem by Yedidia, 
Freeman, and Weiss |[3| on the characterization of fixed 
points of the sum-product algorithm, one obtains the following 
statement. Namely, a fixed point of the sum-product algorithm 
corresponds to a certain pseudo-marginal vector of the factor 
graph under consideration: it is, after taking a biasing channel- 
output-dependent term properly into account, the pseudo- 
marginal vector that has (locally) an extremal number of pre- 
images in all Af-covers, when M goes to infinity]! 

X. The Influence of the Minimum Hamming 
Distance of a Code upon the Bethe Entropy 
Function of its NFG 

It can easily be verified that the Gibbs entropy function 
is a concave function of its arguments. However, the Bethe 
entropy function is in general not a concave function of its 
arguments. This has important consequences when trying to 
minimize the Bethe free energy function because the curvature 
of this function is determined by the curvature of the Bethe 
entropy function. 

In this section we show that choosing a code from an 
ensemble of regular LDPC codes with minimum Hamming 
distance growing (with high probability) linearly with the 
block length comes at the price of having to deal with an 
NFG whose Bethe entropy function is concave and convex 
and whose Bethe free energy function is, therefore, convex and 
concave. (By a multidimensional function being "concave and 
convex" we mean that there are points and directions where the 
function is locally concave, and points and directions where 
the function is locally convex.) Moreover, we show that the 
choice of a code from such an ensemble has implications for 
the accuracy of the pseudo-marginals that are computed by the 
sum-product algorithm. 

'in this statement we included the word "locally" because the sum-product 
algorithm can get stuck at a local extremum of the Bethe free energy function. 
Note that here the use of the word "locaf is different than the use of 
the word "local" when comparing the global perspective of maximum a- 
posteriori decoding with the local perspective of message-passing iterative 
decoding. However, ultimately, this "local" is also a consequence of the 
suboptimal behavior of message-passing iterative decoding stemming from 
its local perspective. 
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We conjecture that the above results are also valid for en- 
sembles of irregular LDPC codes whose minimum Hamming 
distance grows (with high probability) linearly with the block 
length, however, we prove the above statements only for the 
case of regular LDPC codes. 

This section is structured as follows. In Section IX-AI we 
make a simple observation about the induced Bethe entropy 
function for regular LDPC codes. Afterwards, in Section IX-BI 
we discuss how this observation implies the above-mentioned 
results. 

A. An Observation about the Induced Bethe Entropy Function 

In this subsection we consider a setup where the expression 
for the induced Bethe entropy (see Definition |40l ) can be 
simplified significantly. Afterwards we will recognize that 
the obtained expression appears also in some other context. 
This will then lead to the promised conclusions, which are 
discussed in the next subsection. 

Recall the definition of a (rfL, c?R)-regular LDPC code from 
Definition [38] Note that the rate of such a code is lower 
bounded by 1 — di^/dn,. 

Lemma 53 Consider a {di,,dji)-regular length-n LDPC code 
over ¥2 described by some parity-check matrix H, and let 
N(i?) be the NFG associated with H as in Definition \38l 
Then the induced Bethe entropy function along the straight 
line 

a;(.s)^c^(s) •(!,..., 1), s G R, 

evaluates to 

Here we have used the functions 

hdi,,dii : K M 

«R 

dn as 

e : R -^R, s ^ log^Yl (^^^ '5xp(s • w) j , 

h: [0,1]^R, -^log(0-(l-01og(l-0• 
froo/• See Appendix iDl ■ 

Example 54 For {di^,dn) = (2,4) and {di^^d^) = (3,6) 
the graph of s (w(s), hdj^.dni^)) visualized in Figs. [72] 
andUjl respectively. We make the following observations with 
respect to the shapes of these curves and the values that 
hdi,,dnis) takes on. 

• In the case (dLjdR.) = (2,4), it can be verified that 
the graph s (uj{s),hdi^^dii{s)) is concave and that 
hdL,dB.is) always non-negative. 

• In the case {di^,d^) = (3,6), it can be verified that the 
graph s I— >■ (aj(s), /idL,dR(s)) is concave and convex. 




uj{s) 

Fig. 12. The graph of s 1— > ('^(*)i '^dL.dnC*)) ("^Li i^r) = (2, 4). The 
inset zooms into the area near the origin. 
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inset zooms into the area near the origin. 

Moreover, for small uj{s) > the value of hd^^d-ni^) 
is negative. (This behavior is actually typical for the 
behavior of s ^ {uj{s),hdi^.d-R{s)) for any {di,,dn)- 
regular LDPC code with 3 ^ dL < c^R-j D 

It is worth emphasizing that the expression for i/B('^(s)) 
in Lemma [53] holds for any (c?l, rfR)-regular LDPC code over 
F2 of length 77, i.e., it is neither an ensemble average result, 
nor an asymptotic (in n) result. 

Remark 55 Interestingly enough, the functions u and hdi^^d^ 
from Lemma \53\ appear also when studying the ensemble 
of (di^, dB,)-regular LDPC codes with block length going to 
infinity. Namely, the asymptotic growth rate of the average 
number of codewords of relative Hamming weight uj{s) is 
given by hd^.d^is), where the average is taken over Gallager's 
ensemble of (di^,d^)-regular LDPC codes h66\ Section 2]. 
(The same asymptotic growth rate is also obtained for the 
ensemble of all {di^,d^)-regular LDPC codes as defined by 
Richardson and Urbanke ESS, see also / |67]/ . EB, itSi-) □ 

Using the interpretation of the Bethe entropy function that 
was given in Section IVIII this equivalence is not totally 
surprising considering the following facts. (Here, N(i?) refers 
to the NFG in Lemma [53]) 

« Because N(iJ) represents a {di^, (iR)-regular LDPC code, 
any finite graph cover of N(i?) also represents a 
(c^L, rfR)-regular LDPC code. 
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• A "typical" codeword of relative Hamming weight uj{s) 
in a finite graph cover of N (H) maps down to a pseudo- 
codeword that is very close to w(s) •(!,..., 1). 

• The Bethe entropy function value of some pseudo- 
marginal vector in the local marginal polytope of N(i?) 
"counts" the number of valid configurations in finite 
graph covers of N(iJ) that map down to that pseudo- 
marginal vector (See Section IVIII for a more precise 
statement.) 

We leave it as an open problem to find a suitable generaUzation 
of Lemma |53] to irregular LDPC codes and ensembles of 
irregular LDPC codes. 

B. Implications of the Above Observation 

In this subsection we explore some of the implications of 
the observation that the graph s i-^ {^{s), hd^^dais)) appears 
in two different setups, namely in the setup of Lemma |53] and 
in the setup of Remark |55] For this discussion, recall that a 
function with a multi-dimensional domain is called concave if 
at every point of its domain the function is concave in every 
direction. 

Let us first consider Gallager's ensemble of (^l, dnj-regular 
LDPC codes where 3 ^ c^l < . It was already observed by 
Gallager 1661 that codes from this ensemble have a minimum 
Hamming distance that grows (with high probability) linearly 
with the block length. A necessary condition for this to 
happen is that the function /i^l.^rIs) is negative for small 
w(s) > (see Fig. [T3]for the case (dL, ^r) = (3, 6)). Because 
hdi,,dB.i^) = for uj{s) = and because hd^Mnis) > 
for sufficiently large < uj{s) < 1, the function hdj^.d-ais) 
must be a convex function of ui{s) for small a;(s). Combining 
this observation with Lemma |53] and Remark |55] yields the 
conclusion that the induced Bethe entropy function of an NFG 
of a code from this ensemble is concave and convex. (See the 
beginning of this section for our definition of "concave and 
convex.") A sHghtly more involved analysis then also yields 
the conclusion that the Bethe entropy function of an NFG of 
a code from this ensemble is concave and convex. 

Still talking about Gallager's ensemble of (^l, (iR)-regular 
LDPC codes where 3 ^ c^l < <iR, these observations have 
also consequences for the computation of pseudo-marginal 
vectors with the help of fixed points of the sum-product 
algorithm. Recall that the theorem by Yedidia, Freeman, 
and Weiss |3| showed that fixed points of the sum-product 
algorithm correspond to stationary points of the Bethe free 
energy function. Now, the fact that the Bethe entropy function 
is not concave everywhere implies that the Bethe free energy 
function is not convex everywhere, in particular it is not convex 
in the vicinity of pseudo-marginal vectors that correspond 
to codewords. For continuing our argument, assume that the 
received vector is such that the true marginal vector is close to 
the marginal vector corresponding to some codeword. In order 
for the sum-product algorithm to be able to somewhat closely 
reproduce this true marginal vector, the sum-product algorithm 
would have to have a stable fixed point with a pseudo- 
marginal vector somewhat close to this marginal vector, i.e., 
the Bethe free energy function would need to have a local 
minimum at a pseudo-marginal vector somewhat close to this 



marginal vector. However, the above non-convexity results of 
the Bethe free energy function show that this is not possible 
for every true marginal vector]! In conclusion, the accuracy 
of the sum-product-algorithm-based estimation of marginal 
vectors of NFGs of regular LDPC codes from ensembles 
with linearly growing minimum Hamming distance has its 
limitations. However, if only O-vi.-l decisions are important 
(as it very often is the case in channel coding theory) then 
these limitations are usually not that severe. 

For completeness, let us also briefly discuss Gallager's en- 
semble of (dL, (iR)-regular LDPC codes where 2 = c^l < c^r. 
As pointed out by Gallager 1661 . codes from this ensemble 
have a minimum Hamming distance that grows at most 
logarithmically with the block length. This is also reflected 
by the fact that hdj^^dnis) is positive for small a;(s) > 
(see Fig. [T3]for the case (cZl, ^r) = (2, 4)). (This statement is 
not strong enough to prove concavity of hdj^,dji{s) in w(s). For 
establishing this, a detailed analysis of hdi^,dB.{s) as a function 
of a;(.s) is necessary.) 

XI. Conclusions 

We have shown that it is possible to give a combinatorial 
characterization of the Bethe entropy function and the Bethe 
partition function, two functions that were originally defined 
analytically. The key was to study finite graph covers of 
the NFG under consideration, in particular to count valid 
configurations in these finite graph covers. Moreover, we have 
introduced a theoretical tool called symbolwise graph-cover 
decoding that helps to better understand the meaning of the 
pseudo-marginal vector at fixed points of the sum-product 
algorithm. For all these results, the main mathematical tool 
that we used was the method of types. 

We finish with a few remarks. 

> It is clear that all the results that were stated in this paper 
for temperature T = 1 can be suitably generalized to any 
temperature T G K>o. 

• The fractional Bethe approximation (see, e.g., Il73l ) and 
the Kikuchi approximation (see, e.g., ll74l ) are usu- 
ally better approximations than the Bethe approximation. 
Generalizing the results of the present paper, we have 
outlined a combinatorial characterization of the entropy 
function of these approximations in ifTSl . 

> Although the main application of symbolwise graph- 
cover decoding is in obtaining a better understanding of 
fixed points of the sum-product algorithm, one wonders 
if also the transient and the periodic behavior of the sum- 
product algorithm can be characterized in terms of graph 
covers (or variations thereof). Some initial results in that 
direction were sketched in ll76l . 

« It might be interesting to study the influence of redundant 
parity-checks of LDPC codes upon the Bethe entropy 
function. 

''In fact, the operation of the sum-product algorithm on the NFG of a 
code from this ensembles behaves such that once it has "locked into" some 
codeword, the pseudo-marginal vector produced by the sum-product algorithm 
is getting closer and closer to the marginal vector corresponding to that 
codeword. 
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Appendix A 
Proof of Lemma[29] 

Similar to the proof of Lemma l20l a possibility to draw an 
M-cover of N is to first draw M copies of every function node 
of N, then to draw edges that suitably connect these function 
nodes, and finally, where required, to attach half-edges to the 
function nodes. 

In this proof, we will use this drawing procedure to guide 
the counting process. Namely, we start by drawing M copies 
for each function node, along with the sockets where later on 
the edges will be attached to. Let us count in how many ways 
we can specify {a/,m}/Gj^,mG[Ai] that are consistent with /3. 
(We will call these locally valid configurations.) It can easily 
be seen that there are Jl/ej^' {ai'-Pj) ways to do this. 

Fix such a locally valid configuration that is consistent with 
p. Let us count how many graph covers N G A/a/ specify 
an edge connection such that this locally valid configuration 
induces a valid configuration in N. By this we mean that 
there is a unique valid configuration c e C(N) such that the 
following holds. 

• For every full-edge (e, m) e £fu\\ x [M] we have 



a/' 



m' ,e,m 



a/" 



(Here we assumed that the full-edge (e, m) connects the 
two function nodes (/',?7i') and {f",in").) 
• For every half-edge (e, m) S Shsdi x [M] we have 

(Here we assumed that the half-edge (e, m) is connected 
to the function node (/, m).) 

There are precisely Heeff n Ila {M/S^.a^V- such Af-covers. 
This can be seen as follows. Namely, consider some full-edge 
e G £fuii that connects the two function nodes /' and /" 
and fix some Oe S Ae- It follows from the edge consistency 
constraints of the local marginal polytope that /3 is such that 



E 



E 



Therefore, the number of e-sockets among the M copies of 
/' that take on the value Ue is M ■ (3e^a^, and the number of 
e-sockets among the M copies of /" that take on the value 
is also M ■ Pe.a^ ■ These sockets can be connected by M ■ Pe,a^ 
ed ges in exactly (^M fS^ acV- ways. 

Note that the number Heeef n Ha {^^PeM^Y- is indepen- 
dent of the chosen locally valid configuration, and so, among 
all M-covers, the total number of valid configurations that map 
down to /3 equals Ufe:F ij'^,) -Uees,^,, Ua. (m.aj! • The 
lemma statement is then obtained by dividing this number 
by jA^f |, by using the result in (flST l. and by applying the 
abbreviations that are defined in (|22]|-(|23T|. 



Appendix B 
Proof of Theorem[33] 

We start by reformulating the A/th power of Zb.a/(N). 
Namely, we have 

, M 



(a) 



(b) 1 



E E 



(c) 



(d) 



NGA/'a/ ceC(N) 

E r;^ E E [vm(n,c) = /3].5,(c)V- 

exp ( - (M/T) • Ub{P)) 



HeMu cGC(N) 



(e) 



^ Y cxp{-iM/T)-FBif3) + o{M)), (43) 

where at step (a) we have used Definition [32] where at step (b) 
we have used (|4| and Definition [27] where at step (c) we 
have used Definitions [22] and [24] where at step (d) we have 
used Theorem[3T] where at step (e) we have used Eq. (l2Ti) and 
Definitions [27] and [28] and where at step (f) we have used ( l24b . 
Consequently we obtain 

limsup ^B,J\/(N) 

M^oo 



(a) 



(b) 



(d) 



lim sup 

M->oo 



exp I 



{M/T)-FB{f3) + o{M)) 



lim sup M / max exp ( 
lim sup max exp ( - 



-{M/T)-FB{f3) + oiM)) 
(1/T).Fb(/3) + o(1)) 



sup exp(-(l/T).FB(/3)) 
^exp(-l. mf Fb(/3) 

= exp ^-^ -mmFBifS) 
= ZbW, 

where at step (a) we have used ([43]), where in step (b) the 
replacement of the sum by the maximum operator is justified 
by the fact that the size of the set B'j^j grows polynomially in 
M (see Remark |26]|, where step (c) follows from taking the 
maximization operator out of the A/th root, where in step (d) 
we have used the definition of B', the set of lift-realizable 
pseudo-marginal vectors, where at step (e) we have used the 
fact that the closure of B' equals B, which is a consequence 
of Theorem [25] and the fact that B is compact so that the 
infimum operator can be replaced by a minimization operator. 
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and where at step (f) we have used Definition [15] This is the 
result that was promised in the theorem statement. 

Appendix C 
Proof of Theorem[52] 

From the derivations in this appendix it will be apparent 
that there are close connections to the proof of Theorem [33] 
in Appendix iB] 

For any e G any M e Z>o, and any G Ae we have 



~ M 



5n(c) 



M 



me[M] 



Ne^fM cec(N): c^,„=ae 



U7 E ^Ai) E E K™=a,].5^(c) 



(d) 



E A.f E z'(N) E E 



^ exp(-M.C/B(/3))-^ 5: Y: 
|^¥'A/(N,c) = /3 • • ^ [Sg „ = ae] 

me [A/] 



= ^ /3e,a. •exp(-Af -C/bI/S)) 



NeAAjj cec(N) 



(f) 



E • exp ( - M ■ UB{f3)) ■ Cm{^) 



(g) 



M\ 



I3e.a^ • cxp ( - il/ • Fb (/3) + o{M)) ■ 



/see;, 



where at steps (a), (b), and (c) we have used Definition [ST] 
where at step (d) we have used Definitions [22] and [24] where 
at step (e) we have used Theorem [3T] where at step (f) we 
have used Eq. (ISTT i and Definitions [27] and [28] and where at 
step (g) we have used ( l24l i. 

The next step is to evaluate 77e,A/(ae) in the limit M 
00. Because the size of the set B'j^j grows polynomially in 
M (see Remark l26T l. we can use an approach similar to the 
one that was used in Section [IV] to simplify ([TOl i in the limit 
M — >■ 00. We obtain 



M 



lim ?7e,j\/(ae) = 7e • ae' eGf, aeGvAg, 



where 7^ G ]R>o is some suitable constant, and where 



argmin Fb(/3) 



T=l 



Actually, 7e = 1 because f?e.oo(ae) was defined such that 
Ea, '?e,oo(ae) = 1 for aU 6 G £. 

For any f E T and any a/ G .4/, the proof of the second 
statement in Theorem[52]is nearly identical to the above proof. 
We omit the details. 

Appendix D 
Proof of Lemma[53] 

Recall the definition of the Bethe entropy function from 
Definition [14] and the induced Bethe entropy function from 
Definition l40l Fix some pseudo-codeword a; = aj-(l,...,l)G 
V, ^ Lj ^ I, and let (3* ~ 4'bme('^)- We have to evaluate 

= Y.H^AP*s)- E ^B,e(/3:) 

= ^i7B,.(/3*) + E^B,,(/3*)- HbAK). 



Clearly, for every i G I we have m = 1 — oj and 



^i,(0,...,0) 



(44) 



Moreover, the edge consistency constraints of B imply that for 
every e G ffuii it holds that f3*^Q — I — uj and /3* ^ - co, and 
so 



full- 



(45) 



The computations for HB.j{f3*) are more involved because we 
need to find the maximizing (3 = (3* in i35[ . 

These computations are simplified by the observation that 
HB.j{(3*) can be maximized for every j G J separately. 
Therefore, let us fix some j G J^. We have to maximize 



under the constraints 

E 



CLj: aj e — 1 



E/3xa,=l, 



(46) 

(47) 
(48) 



where the constraints in ( l47b are implied by the edge con- 
sistency constraints of B. (Strictly speaking, we also have to 
impose the inequalities ^ Pj,aj ^ 1 for all G Aj, how- 
ever, we will see that the solution satisfies them automatically.) 
Introducing Lagrange multipliers {s^ ejesfj ^nd Vj, we obtain 
the Lagrangian 



E 



E 
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Because of the concavity of the Lagrangian in {Pj.a }a , and 
because of the symmetry of the setup {i.e., the symmetry of the 
single parity-check code and the symmetry of the constraints), 
all Lagrange multipliers {sj.e}ee£j must take on the same 
value, say Sj. Therefore, the new Lagrangian is 



= -E/5^-^«.i«g(/5^-^«.) 

where WB_{o.j) denotes the Hamming weight of aj, and where 
at step (a) we have used \£j\ = d^. Computing the gradient of 
the Lagrangian with respect to {pj a }a , and setting it equal 
to the zero vector, we obtain 

- log(/3j* a, ) - 1 + ■ WH {aj ) + =0, aj e Bj . 
Therefore, 

exp fs,- • wnfai)) 

We define 

Sji^i) - log ^E°^P ' "^hIoj)) j • (50) 
Then the sum of all the constraints in Wl\ implies 

'^R^ = E E ^1.-. = E ) ■ ^l.a, 

(a) V- , . exp(sj ■ wh(Qj)) (b) d . 

(51) 

where at step (a) we have used (|49] l, and where at step (b) we 
have used (BOl l. Solving for lo we obtain 

u = JHs^,)-l-.±e,is,). (52) 

With this, the entropy expression in ( |46l ) can be rewritten to 
read 



= ~d^,-Sj-Lo^'\sj) + ej{sj), (53) 

where at step (a) we have used ( |49] l. ( |50l l, and (|5T] i. and where 
at step (b) we have used (|52] |. It can be verified that this is 
indeed the maximal value of (1461 ) under the constraints in Wt\- 
(|48]l. 

Note that 9j{sj) is a strictly convex function in Sj (see, e.g., 
P9| ). and so ■^dj{sj) is a strictly monotonically increasing 
function in Sj. This implies that for every j ^ J there is a 
unique Sj such that lu = u}^^^{sj). 



Because all function nodes j & J have the same degree, it 
is clear that the functions 9j and a;'-') are independent of j. 
This implies that there is an s G M such that Sj = s for all 
i G J ■ It also implies that HB.j{f3j) is independent of j. 

Finally, adding up all entropy terms, the induced Bethe 
entropy equals 

= 5]i/B,.(/3*)+E^B,,(/3*)- J2 HBAf^l) 
Y,h{uj)-Y,dK-s-u:{s)+Y,0{s)- M^) 



(a) 



(b) 



(^L — 1) • h{uj{s)) ~ n ■ di^ ■ s ■ uj{s) + : 



where at step (a) we have used ( l44b . ( l45b . and ( 153] ), and where 
at step (b) we have used \I\ = n, \£[u\i\ = n ■ d]^, and \ J\ = 

|ffull|/rfR = • ^L/rfR- 

The proof of this lemma is then concluded by observing 
that Oj (sj ) in ( fSOt can also be written as 



= log I ( " j cxp(sj -w)] , 



(54) 



where we have used the fact that the local constraint code Aj 
contains ('^) codewords of weight w if w e {0,1,..., dp} 
is even, and codewords of weight w if w ^ {0, 1, . . . , d^} 
is odd. 
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